BEST AVAILABLE COPY 

WO 01/12659 PCT/IB00/01496 



Query: 168 QPNGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQR 227 

P PPSP+A+SSK P+P+ + + + 

Sbjct: 737 PPAPVSSPPPTPVSSPPALAP-VSSPPSVKSS—PPPAPLSSPPPAPQVKSSPPPVQVSS 793 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP APK SP P+A P V PP + P PL++PP 
Sbjct: 794 PPPAPKSSP PLA--P-VSSPPQVEKTSPPPAPLSSPP 827 

Score « 165 (24.8 bits), Expect = 6.0e-09, P « 6.0e-O9 
Identities = 79/264 (29%), Positives = 105/264 (39%) 



Query: 5 PPPEEAFFSVASPEPAG-PSGSP — ELVSSPAASSSSATALQIQPPGSPDPPP-APPAPA 60 

PPP + + + P P G PS P +VS PS P GSP PP +PP PA 

Sbjct: 517 PPPVK TTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPPA PVGSPPPPEKSPPPPA 570 

Query: 61 PASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAP LVTPSLLQMVRLRSVGAPGG 114 

P +S P V P V PP V +P + +P V AP 

Sbjct: 571 PVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVA 630 

Query: 115 APTPALGPSAPQKPLRRALSGRASPVPAP SSGLHAAVRLKACSLAASEGLSSAQPNG 171 

+ P + P P+ SP P P S+ S+ +S + P 

Sbjct: 631 SSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-- 688 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQA 231 

PP P PP T SK P SPE + + V+ + PP A 

Sbjct: 689 PPTLIPSPPPQEKPTPPSTPSKP PSSPEKPSP-PKEPVSSPPQTPKSSPPPA 739 

Query: 232 PKKSPKAPPPVARKPSVGV — PPPASPSYPRAEPLTAPP 268 

P SP P PV+ P++ PP+ S P PL++PP 

Sbjct: 740 PVSSPP-PTPVSSPPALAPVSSPPSVKSSPPPAPLSSPP 777 



Score - 162 (24.3 bits), Expect = 1.3e-08, P = 1.3e-08 
Identities = 76/272 (27%), Positives - 99/272 (36%) 



Query: 2 ADFPPPEEAFFSVASPEPAG-PSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 60 

A P P SPEP PSP P + SA PPPP +PPA + 

Sbjct: 427 ASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPADDYVPPTPPVPGKSPPATS 486 

Query: 61 PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTP-- 118 

P+ A P V S PP+ VG+P P . V+ S AP G+P+P 
Sbjct: 4 87 PSPQVQPPAASTPPPSLVKLS PPQAPVGSP--PPP VKTTSPPAPIGSPSPPP 536 

Query: 119 ALGPSAPQK-PLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

+ PPKP AGSPP S A S ++PP 

Sbjct: 537 PVSVVSPPPPVKSPPPPAPVG — SPPPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPP 594 

Query: 175 AEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKK 234 

+ PP +P ++ + P P A + + PP P+K 

Sbjct: 595 VKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTPVSSPPP-PEK 653 

Query: 235 SPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPPTNGLP 273 

SP PPP P PP P+ P + + PP LP 
Sbjct: 654 SPPPPPPAKSTP PPEEYPTPPTSVKSSPPPEKSLP 688 



Score » 159 (23.9 bits), Expect - 2.8e-08, P « 2.8e-08 
Identities = 77/264 (29%), Positives - 103/264 (39%) 



Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP — DPPPAP PAP 59 

PPP V+SP P P SP P SS ++ PP +P PP P P P 

Sbjct: 916 PPPA MVSSP-PMTPKSSPP PWVSSPPPTVKSSPPPAPVSSPPATPKSSPPP 966 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP + P V P PV S P AP+ +P + V+ AP +P P 

Sbjct: 967 APVNLPPPEVKSSPPPTPVS-SPPPAPKSSPPPAPMSSPPPPE-VKSPPPPAPVSSPPPP 1024 

Query: 120 LGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEG LSSAQPNGPPEA 175 

+ P P+ ++ P PAP S V+ S +SPP+ 

Sbjct: 1025 VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 17 6 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP A S ++ P P A + A ++ S PP AP S 

Sbjct: 1085 PPPPVKSPPPPAPV SSPPPPIKSPPPP APVSSPPPAPVKPPS— LPPPAPVSS 1135 

Query: 23 6 PK— APPPVARKPSVGVPPPA-SPSYPRAEPLTAPP 268 

P P+K +PPPA S P + PP 

Sbjct: 1136 PPPVVTPAPPKKEEQSLPPPAESQPPPSFNDIILPP 1171 



Score = 143 (21.5 bits), Expect = 1.8e-06, P = 1.6e-06 
Identities - 59/179 (32%), Positives *» 77/179 (43%) 
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Query: 3 DFPPPEEAFFSVASPEP-AGPSGSPELVSSPAASSSSATA-LQIQPPGSP-- DPPP A 55 

+ PPPE S P P + P +P+ PA SS ++ PP +P PPP + 

Sbjct: 970 NLPPPEVK—SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 1027 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PP PAP SS P V P PV PP + P S V+ AP + 

Sbjct: 1028 PPPPAPVSSPPPPVKSPPPPAPVSSPP PPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 116 PTPALGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

P P + P P+ ++ P PAP S A +K SL +SS P PP 

Sbjct: 1085 PPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPPAP-VKPPSLPPPAPVSS--P — PPV 1139 

Query: 175 AEPRPPQ 181 
P PP+ 

Sbjct: 1140 VTPAPPK 1146 

Score » 133 (20.0 bits), Expect - 2.3e-05, P = 2.3e-05 
Identities - 50/132 (37%), Positives » 59/132 (44%) 

Query: 1 MADFPPPEEAFFSVASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP— DPPP 54 

M+ PPPE V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1001 MSSPPPPE VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPP 1055 

Query: 55 APPAPAPASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAPLVTPSLLQMVRLRS 108 

+PP PAP SS P V P PV PP V +P P + 

Sbjct: 1056 PVKSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP— PPPIKSPPPPAP 1113 

Query: 109 VGAPGGAPT — PALGPSAP 125 

V +P AP P+L P AP 
Sbjct: 1114 VSSPPPAPVKPPSLPPPAP 1132 

Score « 110 (16.5 bits), Expect - 8.0e-03, P = 8.0e-03 
Identities = 41/121 (33%), Positives « 49/121 (40%) 

Query: 5 PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP — DPPP 54 

PPP S V SP P PS P V SP A SS ++ PP +P PPP 

Sbjct: 1060 PPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPP 1119 

Query: 55 AP PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

AP P PAP SS P V P K+ + PP E P +L + 

Sbjct: 1120 APVKPPSLPPPAPVSSPPPVVTPAPPKKE EQSLPPPAESQPPPSFNDI ILPPIMANK 117 6 

Query: 109 VGAP 112 
+P 

Sbjct: 1177 YASP 1180 

Score = 108 (16.2 bits), Expect = 1.3e-02, P = 1.3e-02 
Identities = 46/155 (29%), Positives = 67/155 (43%) 

Query: 114 GAPTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVR-LKACS-LAASEGLSSAQPNG 171 

G PTP GP + P + A S +P+P+P + + LS+A+ P+ 
Sbjct: 408 GYPTPGGGPPSSPVPGKPAAS APMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHS 464 

Query: 172 PPEAEPRPPQS PASTAS FI FSKGSRKLQLERPVS PETQ ADLQRNLVAELRSISEQR 227 

PP + PP P S + S ++Q +P + Q + + + 

Sbjct: 465 PPADDYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP AP SP PPPV SV PPP S P P+ +PP 
Sbjct: 525 PP-APIGSPSPPPPV SVVSPPPPVKSPPPPAPVGSPP 560 

Pedant information for DKFZphmcf l_lc23, frame 1 



Report for DKFZphmcf l_lc23 . 1 

[LENGTH] 311 

[MWJ 31534.58 

[pi] 9.48 

[KW] All_Alpha 

[KM] LOW_COMPLEXITY 38.59 % 

SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxx .... xxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 

SEQ PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPAL 
SEG xxxxxx xxxxxxxxxxx 
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PRD cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

SEQ GPSAPQKPLRRALSGPJ\SPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 

SEG xxxxx xxxxxxxxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ QS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP 

SEG xxxxx xxxxxxxxxxxxxxx 

PRD ccccccceeeecccchhhhhccccccchhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceeeccccccccc 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 



(No Prosite data available for DKFZphmcf l_lc23 . 1) 
(No Pfam data available for DKFZphmcf l_lc2 3. 1) 



553 



WO 01/12659 



PCT/IB00/01496 



DKFZphmcfl_lel5 



group: transmembrane protein 

DKFZphmcf l_lel5 encodes a novel 4 54 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound . 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 
transport into cells. 

similarity to D-XYLOSE TRANSPORTER 
membrane regions: 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E1264 6) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1957 bp 

Poly A stretch at pos. 1947, polyadenylation signal at pos. 1929 



1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 
51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 
101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 
151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 
201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 
251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 
301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 
351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 
401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 
451 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 
501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 
551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 
601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 
651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 
701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 
751 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 
801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 
851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 
901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 
951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 
1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 
1051 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 
1101 CACCTGCCTG ACCGGAGTCC TGGGTGTGGG CCTGGGTGTG GAGATCAGCC 
1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 
1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 
1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 
1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 
1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 
1401 GCTGGGTGAT GCTGGGAGCC CCTACCTCAT TGGCCTGATC TCTGACCGCC 
1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 
1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 
1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 
1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 
1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 
1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 
1751 GCCCACCCCA CGAAGGGCCT GGGCCTAACC CCTTGGCCTG GCCCAGCTTC 
1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 
1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 
1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 



BLAST Results 



Entry E1264 6 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score - 3046, P = 2.2e-131, identities - 640/659 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 340 bp to 1701 bp; peptide length: 454 
Category; similarity to known protein 



1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS TGNPKSEEPE VPDQEGLQRI 

51 TGLSPGRSAL IVAVLCYINL LNYMDRFTVA VFISSYMVLA PVFGYLGDRY 

101 NRKYLMCGGI AFWSLVTLGS SFIPGEHFWL LLLTRGLVGV GEASYSTIAP 

151 TLIADLFVAD QRSRMLSIFY FAIPVGSGLG YIAGSKVKDM AGDWHWALRV 

201 TPGLGVVAVL LLFLVVREPP RGAVERHSDL PPLNPTSWWA DLRALARNH 

251 FGLITCLTGV LGVGLGVEIS RRLRHSNPRA DPLVCATGLL GSAPFLFLSL 

301 ACARGSIVAT YIFIFIGETL LSMNWAIVAD ILLYVVIPTR RSTAEAFQIV 

351 LSHLLGDAGS PYLIGLISDR LRRNWPPSFL SEFRALQFSL MLCAFVGALG 

401 GAAFLGTAIF IEADRRRAQL HVQGLLHEAG STDDRIWPQ RGRSTRVPVA 
451 SVLI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4_1 gene: "C13C4.5"; Caenorhabditis elegans cosmid C13C4, 
N = 3, Score ° 441, P = 5.2e-76 

TREMBL :CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N = 2, Score - 449, P » 8.2e-69 

TREMBL : CEF09A5_1 gene: M F09A5.1"; Caenorhabditis elegans cosmid F09A5, 
N = 3, Score =• 413, P = 9.1e-60 

TREMBL : ATF6H1 1_1 8 gene: 7F6H11 . 180"; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N ■ 3, Score = 193, P = 2.5e-24 

SWISSPROT:XYLT_LACBR D-XYLOSE-PROTON SYMPORT (D-XYLOSE TRANSPORTER) . , N 
= 1, Score = 180, P = 7.9e-ll 



>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length =488 

HSPs: 



Score » 449 (67.4 bits), Expect = 8.2e-69, Sum P(2) - 8.2e-69 
Identities * 88/204 (43%), Positives - 125/204 (61%) 



Query: 


58 


SALIVA VLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 


117 




+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 




Sbjct: 


29 


AGVLTQVQTYYNISDSLGGLIQTVFLIS FMV F S PVCG Y LGDRFNRKW I M 1 1 GVG I W LG A V 


88 


Query; 


118 


LGSSFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAI PVGS 


177 






LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 




Sbjct: 


89 


LGSSFVPANHFWLFLVLRSFVGIGEASYSNVAPSLISDMFNGQKRSTVFMIFYFAIPVGS 


148 


Query: 


178 


GLG Y I AG S KVKDMAG DW H WAL R VT PGLGV V AV L L L F L VV RE P P RGA V E R HSDLPPL 


233 




GLG+I GS V + G W W +RV+ G++ ++ L L EP RGA ++ D+ 




Sbjct: 


149 


GLGFIVGSNVATLTGHWQWGIRVSAIAGLIVMIALVLFTYEPERGAADKAMGESKDVVVT 


208 


Query: 


234 


NPTSWWADLRALARNLI FGLITCLTG 259 








T++ DL L + L+ C G 




Sbjct: 


209 


TNTTYLEDLVILLKTPT--LVACTWG 232 




Score 


- 267 


(40.1 bits), Expect = 8.2e-69, Sum P(2) « 8.2e-69 




Identities = 


= 74/212 (34%), Positives - 113/212 (53%) 




Query: 


249 


LI FGLITCLTGVLGVGLGVEI SRRL RHSNPRADPLVCATGLLGSAPFLFLSL 


300 




L FG IT G++GV G +S+ L R RA PLV G L +APFL + + 




Sbjct: 


277 


LYFGAITTAGGLIGVIFGSMLSKWLVAGWGPFRRLQTDRAQPLVAGGGALLAAPFLLIGM 


336 


Query: 


301 


ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 


360 
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S+V YIIFGT+ NW+ D+L V+ P RRSTA ++ +++SHL GDA 
Sbjct: 337 IFGDKSLVLLYIMIFFGITFMCFNWGLNIDMLTTVIHPNRRSTAFSYFVLVSHLFGDASG 396 

Query: 361 PYLIGLISDRLRRN — WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRR — 416 

PYLIGLISD +R +P ++ +L + C + L + +++ + +DR+ 

Sbjct: 397 PYLIGLISDAIRHGSTYPKD QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 453 

Query: 417 RAQLHVQGLLHEA— GSTD--DRIWPQRGRSTRV 447 

RA++ + h + STD +RI + S+R+ 
Sbjct: 454 RAEMGLDDLQSKPIRTSTDSLERIGINDDVASSRL 488 



Score = 70 (10.5 bits), Expect => 5.9e-24, Sum P(2) 
Identities = 25/09 (28%), Positives = 41/89 (46%) 



5.9e-24 



Query: 62 VAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT--LG 119 

V L +NLLNY+DR+TVA ++ + LG +L+ +S V LG 

Sbjct: 11 VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI--SFMVFSPVCGYLG 68 

Query: 120 SSFIPGEHFWLLLLTRGLVGVGEASYSTIAP 150 

F W++++ G + +G S+ P 

Sbjct: 69 DRF NRKWIMIIGVG-IWLGAVLGSSFVP 95 

Pedant information for DKFZphmcf l_lel5, frame 1 



Report for DKFZphmcf l_lel5 . 1 



t LENGTH J 
[MW] 
[pi] 
[HOMOLJ 

[BLOCKS J 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW) 



454 

49013.35 
7.66 

TREMBL:CEC13C4_1 gene: "C13C4.5" 
BL01022D 

MYRISTYL 11 
CAMP_PHOSPHO SITE 1 
C K2_PHOS PHO_S I TE 3 
PROKAR_LIPOPROTEIN 1 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO SITE 4 
TRANSMEMBRANE 8 
LOW COMPLEXITY 15.42 % 



Caenorhabditis elegans cosmid C13C4 2e-51 



SEQ MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 

SEG xxxxxxxxxxxxxxxx 

PRD cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



SEQ 
SEG 
PRD 
MEM 



IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 



hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 
MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 



SEQ SFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGSGLG 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ Y I AG S K VK DMAG DW HW AL RVT PG LG VV A VLL L FL VVRE P P RG A VE RHSDLPPLNPT S WWA 

SEG xxxxxxxxxxxxx 

PRD eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 

MEM MMMMMMMMM 

SEQ DLRALARNLIFGLITCLTGVLGVGLGVEISRRLRHSNPRADPLVCATGLLGSAPFLFLSL 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 

SEG 

PRD ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchhhhhhcccccccccccc 

MEM MMMM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMM 

SEQ PYLIGLISDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRRRAQL 

SEG xxxxxxxxxxxxx 

PRD ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhh 

MEM MMMMMMMM MM 



SEQ 



HVQGLLHEAGSTDDRIVVPQRGRSTRVPVASVLI 
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SEG 

PRD hhhhhhhhccccceeeeeeccccccceeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphmcf l_le!5 . 1 



PS00002 

|J V w V V 


177->181 


GLYCOSAMINOGLYCAN 


PDOC00002 


rj w w ■ 


340->34 4 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


270->273 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 . 


339->342 


PKC~PHOSPH0 SITE 


PDOC00005 


PS00005 


368->371 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


444->447 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


11->15 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


342->346 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


431->435 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


26->32 


MYRISTYL 


PDOC00008 


PS00008 


32->38 


MYRISTYL 


PDOC00008 


PS00008 


52->58 


MYRISTYL 


PDOC00008 


PS00008 


139->145 


MYRISTYL 


PDOC00008 


PS00008 


176->182 


MYRISTYL 


PDOC00008 


PS00008 


252->258 


MYRISTYL 


PDOC0Q0O8 


PS00008 


262->268 


MYRISTYL 


PDOC00008 


PS00008 


266->272 


MYRISTYL 


PDOC00008 


PS00008 


288->294 


MYRISTYL 


PDOC00008 


PS00008 


305->311 


MYRISTYL 


PDOC00008 


PS00008 


397->403 


MYRISTYL 


PDOC00008 


PS00013 


292->303 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphmcf l_lel5.1) 
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DKFZphmcfl_lgl3 



group: mammary carcinoma derived 

DKFZphmcf l_lgl3 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0543 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes . 



similarity to KIAA0766 

commplete cDNA, complete cds, few EST hits 

on genomic level encoded by AC005020 # no splicing, genomic? 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2210 bp 

Poly A stretch at pos . 2200, polyadenylation signal at pos. 2176 



1 GAAACCTGAT CTCATAAAAC CTAGGTCACA AAGGACAGCC CTGCAAAACA 
51 GACCCTATTT GGATCAAGTG AGCCAGTTCC TGGAACCTGA ATAATGACTC 
101 CTGAATCAAG GGATACTACA GATTTGTCTC CAGGGGGTAC CCAGGAGATG 
151 GAAGGCATCG TGATAGTGAA GGTGGAGGAG GAAGATGAAG AAGACCATTT 
201 TCAAAAGGAA AGAAACAAAG TAGAGTCATC GCCACAAGTT CTCAGTCGCT 
251 CTACAACTAT GAATGAGAGA GCCTTATTGT CATCGTATTT AGTTGCATAT 
301 AGAGTGGCAA AAGAGAAAAT GGCTCACACA GCGGCTGAAA AAATTATCCT 
351 TCCAGCATGT ATGGACATGG TACGGACAAT TTTTGATGAC AAATCAGCTG 
401 ATAAACTAAG AACTATACCT CTTAGTGATA ATACAATATC TCGTCGAATC 
451 TGTACGATTG CAAAACATTT GGAAGCAATG CT TAT TAG AC GGCTGCAGTC 
501 CGGTATAGAC TTTGCAATCC AACTCGATGA GAGCACTGAT ATTGCAAGTT 
551 GTCCCACACT CTTGGTTTAT GTCAGATATG TGTGGCAAGA TGATTTTGTA 
601 GAGGATCTCT TATGTTGTTT AAATTTAAAT TCACATATAA CTGGATTAGA 
651 TTTATTTACT GAATTAGAAA ACTGCCTTCT TGGTCAGTAT AAATTAAACT 
701 GGAAACATTG TAAAGGAATT TCAAGTGATG GAACAGCAAA TATGACCGGA 
751 AAACACAGCA GACTTACTGA AAAATTGTTA GAAGCAACCC ACAACAATGC 
801 TGTTTGGAAT CACTGTTTTA TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 
851 TTTCACCAAG TCTGATGGAT GTATTGAAAA ATGCAGTGAA AACTGTTAAT 
901 TTTATTAAAG GAAGCTCACT GAATAGCCGA CTTCTCGAAA TATTTTGTTC 
951 AGAGATTGGA GTGAACCACA CCCACTTATT GTTTCATACA GAAGTTCGTT 
1001 GGCTTTCTCA AGGAAAAGTA TTGAGCAGAG TATATGAACT CAGGAACGAG 
1051 ATTTACATTT TTCTCGTTGA AAAGCAATCT CATTTGGCAA ATATTTTTGA 
1101 AGACGACATT TGGGTAACAA AATTGGCATA TTTAAGTGAT ATTTTTGGCA 
1151 TTCTTAATGA ATTAAGCCTG AAAATGCAGG GGAAAAACAA TGATATATTT 
1201 CAGTATCTTG AACATATTCT AGGATTCCAA AAGACGTTAT TATTGTGGCA 
1251 AGCAAGACTT AAAAGTAACC GCCCTAGCTA CTATATGTTT CCAACATTAT 
1301 TGCAACACAT CGAAGAGAAC ATTATTAATG AAGACTGCTT AAAAGAAATA 
1351 AAATTAGAGA TATTGTTGCA TCTCACTTCT TTGTCTCAAA CTTTTAATTA 
1401 TTACTTTCCG GAAGAGAAAT TTGAATCATT AAAGGAAAAT ATTTGGATGA 
1451 AAGATCCATT TGCTTTTCAA AACCCAGAAT CAATAATTGA GTTAAACTTG 
1501 GAGCCTGAAG AAGAGAATGA ATTATTGCAG CTCAGTTCAT CATTCACACT 
1551 AAAGAATTAT TATAAGATAT TAAGTTTATC AGCATTTTGG ATTAAGATTA 
1601 AAGATGACTT TCCACTGCTA AGTAGGAAGA GTATATTGCT GTTACTACCA 
1651 TTCACAACTA CATATTTGTG TGAACTAGGA TTTTCAATCT TGACACGGTT 
1701 AAAAACAAAG AAGAGAAATA GGCTCAATAG TGCACCAGAT ATGCGGGTAG 
1751 CATTATCTTC ATGTGTTCCT GACTGGAAGG AACTTATGAA CAGACAAGCA 
1801 CACCCATCAC ATTAAATACA AACTTTACAA AATTCTGTGT ATAGCCAGGT 
1851 GTGGTGGCTT ACGCCTGTAA TCCCAGCAGT GGGAGACCGA GGTGGGCAGA 
1901 TCACTTGAGT TCAAGACCAG CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 
1951 CTAAAAATAG AAACCTTAGC CAGGCGTGGT GGCACATGCC TGCAGTCCCA 
2001 GTTACTTGGG TGCCTGAGGC AGGAGAATCT CTTAAACCAG GAAGGCAGAG 
2051 ATTGCAGTGA GCTGAGATAA TCCCACTGCA TTCCAGCCTG GGCAACAGCG 
2101 TGAGACTTCA TCTCAAAAAA AAAAAATTGT ATTTGTACTT TTAAAGGGAT 
2151 TTTGCAGTAT GTTGTAGTTA AACGTTAATA AAATTATATT TGTAATTAGG 
2201 AAAAAAAAAA 



BLAST Results 



Entry AC005020 from database EMBL: 

Homo sapiens clone GS259H13; HTGS phase 1, 4 unordered pieces. 
Score = 9110, P « 0.0e+00, identities = 1822/1822 
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Medline entries 

No Medline entry 

Peptide information for frame 1 



ORF from 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 



1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDHFQKER NKVESSPQVL 
51 SRSTTMNERA LLSSYLVAYR VAKEKMAHTA AEKIILPACM DMVRTIFDDK 
101 SADKLRTIPL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 
151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 
201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 
251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 
301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANIFEDDIW VTKLAYLSDI 
351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 
401 TLLQHIEENI INEDCLKEIK LEILLHLTSL SQTFNYYFPE EKFESLKENI 
451 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 
501 KIKDOFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 
551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877_3 from database TREMBLNEW: 

gene: ,, WUGSC:H_DJ0751H13.2"; product: "KIAA0543 protein"; Homo sapiens 

PAC clone DJ0751H13 from 7q35-qter, complete sequence. 

Score =86, P = 4.46-03, identities - 46/179, positives - 78/179 

Entry MD36211_1 from database TREMBL: 

product: "Hermes transposase"; Musca domestica Hermes transposase 

gene, complete, cds. 

Score « 105, P = 3.0e-02, identities = 101/465, positives = 202/465 



Alert BLASTP hits for DKFZphmcf l_lgl3, frame 1 

TREMBL :AB018309_1 gene: M KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N = 1, Score = 300, P 
= l.le-23 

>TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length = 607 

HSPs: 

Score - 300 (45.0 bits), Expect = l.le-23, P - l.le-23 
Identities « 120/485 (24%), Positives * 229/485 (47%) 

CMD-MVRTIFDDKSADKLRTIPLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 
CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 



+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 



G+++ T M G++S L + E + WN H F+H E L S ++ + 



KNAVKTVNFIKGSSLNSRLLEIFCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNEI 320 
+ + IK + + +E H + + WL +GK L ++ LR E+ 



YIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLKMQGKNNDIFQYLEHILGFQK 380 

FLV + + F D W+ +L DI L ELS +++ +HI F+ 

EAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFEV 417 



Query: 


89 


Sbjct: 


124 


Query: 


148 


Sbjct: 


183 


Query: 


206 


Sbjct: 


241 


Query: 


262 


Sbjct: 


299 


Query: 


321 


Sbjct: 


359 
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Query: 381 TLLLWQARLKSNRPSYYMFPTLLQHIEE NIINEDCLKEIKLEILLHLTSLSQTFNY 436 

L L+Q ++ + FP L + ++E N +E + ++++ L + F 

Sbjct: 416 KLNLFQRHIEEKNLTD — FPALREVVDELKQQNKEDEKI FDPDRYQMVI — CRLQKEFER 473 

Query: 437 YFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILSL 4 95 

+F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I L 

Sbjct: 474 HFKDLRF--IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKDL 525 

Query: 496 SAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDMR 551 

F+ + + +P++ + + F + +CE FS LTR + L R 

Sbjct: 526 GQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALFR 585 

Query: 552 VALSSCVPDWKELMNRQAHPSH 573 

VA + P W +L+ R+ + S+ 
Sbjct: 586 VATTEMEPGWDDLV-RERNESN 606 

Score - 290 (43.5 bits), Expect » 1.5e-22, P » 1.5e-22 
Identities « 120/485 (24%), Positives = 228/485 (47%) 

Query: 89 CMD-MVRTIFDDKSADKLRTIPLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 

CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 

Sbjct: 124 C ME VLLREV L PE H - VS VLQG V D L S P D I T RQRI LSI DRNL RNQL FN RAR D FKA Y S L AL D DQ 182 

Query: 148 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
Sbjct: 183 AFVAYENYLLVFIRGVGPELEVQEDLLTIINLTHHFSVGALMSAILES--LQTAGLSLQR 240 

Query: 206 CKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNHCFIHREALVSKEISPSLMDV-LKNA 264 

G+++ T M G++S L + E + WN IH + E+ S DV + 
Sbjct: 241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWN— VIHYSGFLHLELLSSY-DVDVNQI 297 

Query: 265 VKTVN FIKGSSLNSRLLEIFCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 319 

+ T++ IK + + +E H + + WL +GK L ++ LR E 

SbjCt: 298 INTISEWIVLIKT RG V RRPE FQT LLT E SESEHGERVNGRC LNNWL RRGKT LKL I FS LRK E 357 

Query: 320 IYIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLKMQGKNNDIFQYLEHILGFQ 379 

+ FLV + + F D W+ +L DI L ELS +++ +HI F+ 

Sbjct: 358 MEAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFE 416 

Query: 380 KTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIKL EILLHLTSLSQTFN 4 35 

L L+Q ++ + FP L + ++E + + ++K+ ++L + F 

Sbjct: 417 VKLNLFQRHIEEKNLTD — FPALREVVDE—LKQQNKEDEKIFDPDRYQMVICRLQKEFE 472 

Query: 4 36 YYFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILS 494 

+ F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I 

SbjCt: 473 RHFKDLRF — IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKD 524 

Query: 495 LSAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDM 550 

L F+ + + +P++ + + F + +CE FS LTR + L 
Sbjct: 525 LGQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALF 584 

Query: 551 RVALSSCVPDWKELMNRQAHPSH 573 

RVA + P W +L+ R+ + S+ 
Sbjct: 585 RVATTEMEPGWDDLV-RERNESN 606 

Pedant information for DKFZphmcf l_lgl3, frame 1 



Report for DKFZphmcf l_lgl3 . 1 

[LENGTH] 573 

[MW] 66276.85 

[pi] 5.82 

[ HOMOL J TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo sapi 

mRNA for KIAA0766 protein, complete cds. le-18 

[PROSITEJ MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 9 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] All Alpha 

[KW] LOVTCOMPLEXITY 8 . 90 % 

SEQ MTPESRDTTDLSPGGTQEMEGIVIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 
SEG xxxxxxx 



PRD ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 
SEQ LLSSYLVAYRVAKEKMAHTAAEKIILPACMDMVRTIFDDKSADKLRTIPLSDNTISRRIC 
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SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhcccccceeeeecccchhhhhhh 

SEQ TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRYVWQDDFVEDLLCCLNLNS 

SEG 

PRD hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 

SEQ HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 

SEG 

PRD eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 

SEQ CFIHREALVSKEISPSLMDVLKNAVKTVNFIKGSSLNSRLLEIFCSEIGVNHTHLLFHTE 

SEG 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhhh 

SEQ VRWLSQGKVLSRVYELRNEIYIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLK 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhhhhhchhhhhcccceeehhhhhhhhhhhhhhhhhhh 

SEQ MQGKNNDI FQYLEH ILGFQKTLLLWQARLKSNRPS Y YMFPTLLQH I EEN I I NEDCLKEI K 

SEG xxxxx 

PRD hhccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhcchhhhhh 

SEQ LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

SEG xxxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SEQ SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLFFTTTYLCELGFSILTRLKTKK 

SEG xxx xxxxxxxxxxx 

PRD hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

SEQ RNRLNSAPDMRVALSSCVPDWKELMNRQAHPSH 

SEG 

PRD hcccccccccceeeccccccchhhhhhhccccc 

Prosite for DKFZphmcf l_lgl3 . 1 

PS00001 216->220 ASN_GLYCOSYLATION PDOC00001 

PS00001 291->295 ASN_GLYCOSYLATION PDOC00001 

PS00005 116->119 PKC_PHOSPHO_SITE PDOC00005 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 

PS00005 225->228 PKC_PHOSPHO_SITE PDOC00005 

PS00005 358->361 PKC_PHOSPHO_SITE PDOC00005 

PS00005 391->394 PKC_PHOSPHO_SITE PDOC00005 

PS00005 4 4 5->448 PKC_PHOSPHO_SITE PDOC00005 

PS00005 485->4 88 PKC_PHOSPHO_SITE PDOC00005 

PS00005 510->513 PKC_PHOSPHO_SITE PDOC00005 

PS00005 538->541 PKC_PHOSPHO_SITE PDOC00005 

PS00006 55->59 CK2_PHOSPHO_SITE PDOC00006 

PS00006 79->83 CK2_PHOSPHO_SITE PDOC00006 

PS00006 95->99 CK2 PHOSPHO_SITE PDOC00006 

PS00006 136->140 CK2~PHOSPHO_SITE PDOC00006 

PS00006 183->187 CK2_PHOSPHO_SITE PDOC00006 

PS00006 1B9->193 CK2 PHOSPHO_SITE PDOC00006 

PS00006 256->260 CK2~PHOSPHO_SITE PDOC00006 

PS00006 4 45->4 49 CK2_PHOSPHO_SITE PDOC00006 

PS00006 4 63->4 67 CK2_PHOSPHO_SITE PDOC00006 

PS00006 54 6->550 CK2_PHOSPHO_SITE PDOC00006 

PS00007 364->372 TYR_PHOSPHO_SITE PDOC00007 

PS00008 137->143 MYRISTYL PDOC00008 

PS00008 273->279 MYRISTYL PDOC00008 

PS00008 289->295 MYRISTYL PDOC00008 

(No Pfam data available for DKFZphmcf l_lgl3. 1) 
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DKFZphtes3_14g5 



group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . It 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide), but not the zinc finger motif and and nuclear localization signals of 
lyar. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 1503 bp 

Poly A stretch at pos. 1467, polyadenylation signal at pos. 1440 



1 CCCAGAGGTC CGACCTGGGA GGCTGGGGCT CAGAGAGCAA TGTTTGCTGT 
51 CTTCCATTGG AGTGACTGAA TTTCTACATG ACGGCTTTTT GACAAGACTT 
101 AAAACCTGTC TTGGATAGAG AATATTTAGC CATTTACCTA AAAATGGTAT 
151 TTTTTACATG CAATGCATGT GGTGAATCAG TGAAGAAAAT ACAAGTGGAA 
201 AAGCATGTGT CTGTTTGCAG AAACTGTGAA TGCCTTTCTT GCATTGACTG 
251 CGGTAAAGAT TTCTGGGGCG ATGACTATAA AAACCACGTG AAATGCATAA 
301 GTGAAGATCA GAAGTATGGT GGCAAAGGCT ATGAAGGTAA AACCCACAAA 
351 GGCGACATCA AACAGCAGGC GTGGATTCAG AAAATTAGTG AATTAATAAA 
401 GAGACCCAAT GTCAGCCCCA AAGTGAGAGA ACTTTTAGAG CAAATTAGTG 
451 CTTTTGACAA CGTTCCCAGG AAAAAGGCAA AATTTCAGAA TTGGATGAAG 
501 AACAGTTTAA AAGTTCATAA TGAATCCATT CTGGACCAGG TGTGGAATAT 
551 CTTTTCTGAA GCTTCCAACA GCGAACCAGT CAATAAGGAA CAGGATCAAC 
601 GGCCACTCCA CCCAGTGGCA AATCCACATG CAGAAATCTC CACCAAGGTT 
651 CCAGCCTCCA AAGTGAAAGA CGCCGTGGAA CAGCAAGGGG AGGTGAAGAA 
701 GAATAAAAGA GAAAGAAAGG AAGAACGGCA GAAGAAAAGG AAAAGAGAAA 
751 AGAAAGAACT AAAGTTAGAA AACCACCAGG AAAACTCAAG GAATCAGAAG 
801 CCTAAGAAGC GCAAAAAGGG ACAGGAGGCT GACCTTGAGG CTGGTGGGGA 
851 GGAAGTCCCT GAGGCCAATG GCTCTGCAGG GAAGAGGAGC AAGAAGAAGA 
901 AGCAGCGCAA GGACAGCGCC AGTGAGGAAG AGGCACGCGT GGGCGCAGGG 
951 AAGAGGAAGC GGAGGCACTC GGAAGTTGAA ACAGATTCTA AGAAGAAAAA 
1001 GATGAAGCTC CCAGAGCATC CTGAGGGCGG AGAACCAGAA GACGATGAGG 
1051 CTCCTGCAAA AGGTAAATTC AACTGGAAGG GAACTATTAA AGCAATTCTG 
1101 AAACAGGCCC CAGACAATGA AATAACCATC AAAAAGCTAA GGAAAAAGGT 
1151 TTTAGCTCAG TACTACACAG TGACAGATGA GCATCACAGA TCCGAAGAGG 
1201 AACTCCTGGT CATCTTTAAC AAGAAAATCA GCAAGAACCC TACCTTTAAG 
1251 TTATTAAAGG ACAAAGTCAA GCTTGTGAAA TGAACATTTG TGTATTTAAA 
1301 AATTGAATCC ATTCTGCTGA CTTCTTCCTT TCACTGCTGT TTATAAAATG 
1351 TGTAATGAAT TCTAACAACT CAAATTTTGC TTTTTGAAGC TGTATTTTTA 
1401 AGTTAAGAAA ATATATTTTT GGTATAACTT TTATGAGAAA AATAAAATAT 
1451 ATTCTGGTCC AAACTTCAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1501 AAA 



BLAST Results 



No BLAST result 



Medline entries 



93259460: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 
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Peptide information for frame 3 



ORF from 14 4 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP GTP A (60-68) 



1 MVFFTCNACG ESVKKIQVEK HVSVCRNCEC LSCIDCGKDF WGDDYKNHVK 
51 CISEDQKYGG KGYEGKTHKG DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 
101 ISAFDNVPRK KAKFQNWMKN SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 
151 DQRPLHPVAN PHAEISTKVP ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 
201 REKKELKLEN HQENSRNQKP KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 
251 KKKQRKDSAS EEEARVGAGK RKRRHSEVET DSKKKKMKLP EHPEGGEPED 
301 DEAPAKGKFN WKGTIKAILK QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 
351 EEELLVIFNK KISKNPTFKL LKDKVKLVK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14g5, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N « 
1, Score = 1410, P = 2.7e-144 

SWISSPROT:YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10.8 IN 
CHROMOSOME III., N » 1, Score = 381, P = 2.9e-35 

TREMBL:AC003058_18 gene: "F27F23.18"; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N = 3, Score - 139, P = 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae), N = 1, Score « 164, P = 1.4e-ll 



>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length - 388 

HSPs: 

Score = 1410 (211.6 bits), Expect = 2.7e-144, P « 2.7e-144 
Identities =» 275/388 (70%), Positives = 317/388 (81%) 

Query: 1 MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 60 

MVFFTCNACGESVKKIQVEK VS CRNCECLSCIDCGKDFWGDDYK+HVKCISE QKYGG 
Sbjct: 1 MVFFTCNACGESVKKIQVEKQVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGG 60 

Query: 61 KG YEGKTHKGDI KQQAWIQKI SELI KRPNVS PKVRELLEQI SAFDNVPRK KAKFQNWMKN 120 

KGYE KTHKGD KQQAWIQKI +ELIK+PNVSPKVRELL+QISAFDNVP K KAKFQNWMKN 
Sbjct: 61 KGYEAKTHKGDAKQQAWIQKINELIKKPNVSPKVRELLQQISAFDNVPIKKAKFQNWMKN 120 

Query: 121 SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEIS-TKVPASKVKDAVE 179 

SLKVH++S+L+QVW+IFSEAS+SE ++Q Q P H A PHAE+ TKVP++K E 
Sbjct: 121 SLKVHSDSVLEQVWDIFSEASSSE QDQQQPPSH-TAKPHAEMPITKVPSAKTNGTTE 176 

Query: 180 QQGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVP 239 

+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE+ 
Sbjct: 177 EQTEAKKNKRERKEERQKNRKKEKKELKLENHQENLRGQKPKKRKKNQEAGHEAAGEDGA 236 

Query: 240 EANG SAGKRSKKKKQRKDSASEEEA RVGAGKRKR- RHS EVETDSKKKKM 287 

+ +G G+ S++ R E+ A + AGKRKR +HS E+ KKKKM 

Sbjct: 237 DGSGPPEKKKAQGGQASEEGADRNGGPGEDRAEGQTKTAAGKRKRPKHSGAESGYKKKKM 296 

Query: 288 KLPEHPEGGEPEDDEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEH 347 

KLPE PE GE +D EAP+KGKFNWKGTIKA+LKQAPDNEI ++KKL+KKV+AQY+ V ++ 
Sbjct: 297 KLPEQPEEGEAKDHEAPSKGKFNWKGTIKAVLKQAPDNEISVKKLKKKVIAQYHAVMNDT 356 

Query: 34 8 HRSEEELLVI FNKKI SKNPTFKLLKDKVKLVK 379 

EEELL IFN+KIS+NPTFK+LKD+VKL+K 
Sbjct: 357 SHHEEELLAI FNRKI SRNPTFKVLKDRVKLLK 388 



Pedant information for DKFZphtes3_14g5, frame 3 
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Report for DKFZphtes3_14g5.3 



[LENGTH] 379 

[MW] 43634.03 

[plj 9.59 

[HOMOL] PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YCR087c-a) 2e-ll 

[BLOCKS] BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 
[PROSITE] ATP_GTP_A 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 18.73 % 

SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 

SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

SEQ KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEISTKVPASKVKDAVEQ 

SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 

SEG . . xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEHHRSEEELLVIFNK 

SEG xxxxx : 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 



Prosite for DKFZphtes3_14g5 . 3 
PS00017 60->68 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphtes3_14g5.3) 
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DKFZphtes3_14h21 



group: nucleic acid management 

DKFZphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA. helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2200 bp 

Poly A stretch at pos . 2166, polyadenylation signal at pos. 2140 



1 CAACGACGTC GGACGCGCCC CTTCTTGGAA CAATGTCCCA CCACGGAGGA 
51 GCTCCCAAGG CCTCTACGTG GGTCGTTGCT AGTCGGCGAA GCTCGACAGT 
101 GTCCCGAGCG CCAGAGAGGA GGCCGGCGGA GGAGTTGAAT CGAACAGGTC 
151 CTGAGGGATA TAGTGTCGGC AGAGGTGGTC GCTGGAGAGG CACCTCTAGG 
201 CCCCCGGAGG CCGTGGCCGC TGGTCACGAG GAACTGCCGC TGTGTTTTGC 
251 TTTGAAGAGC CACTTTGTTG GCGCGGTAAT CGGTCGTGGT GGGTCAAAAA 
301 TAAAGAATAT ACAAAGTACA ACAAACACCA CAATCCAAAT AATACAAGAA 
351 CAACCAGAAT CATTAGTCAA AATTTTTGGC AGCAAGGCAA TGCAAACGAA 
401 AGCAAAAGCA GTGATAGACA ATTTTGTTAA AAAGCTAGAA GAAAATTACA 
451 ATTCAGAATG CGGAATTGAT ACTGCATTCC AACCTTCTGT TGGAAAAGAT 
501 GGAAGCACAG ATAACAATGT TGTTGCAGGA GATCGGCCAT TGATAGATTG 
551 GGATCAAATT AGAGAGGAAG GTTTGAAATG GCAAAAAACA AAGTGGGCAG 
601 ATTTACCACC AATTAAGAAA AACTTTTATA AAGAGTCCAC TGCCACAAGT 
651 GCCATGTCAA AAGTAGAAGC AGATAGTTGG AGGAAAGAAA ATTTTAATAT 
701 AACGTGGGAT GACTTGAAGG ATGGGGAGAA ACGACCTATC CCCAATCCTA 
751 CCTGCACATT TGATGACGCC TTTCAATGTT ATCCTGAGGT TATGGAAAAC 
801 ATTAAAAAGG CAGGTTTTCA AAAGCCAACA CCTATTCAGT CACAGGCATG 
851 GCCCATTGTG TTGCAAGGAA TAGATCTTAT AGGAGTAGCC CAGACTGGAA 
901 CAGGAAAGAC ATTGTGTTAT TTAATGCCTG GATTTATTCA TCTGGTCCTT 
951 CAACCCAGCC TTAAAGGTCA AAGGAATAGA CCCGGCATGT TAGTTCTAAC 
1001 TCCCACTCGG GAATTAGCAC TTCAAGTAGA AGGAGAATGT TGCAAATATT 
1051 CATATAAAGG GCTTCGGAGT GTTTGTGTAT ATGGTGGTGG AAATAGAGAT 
1101 GAACAAATAG AAGAGCTTAA AAAAGGTGTA GATATCATAA TTGCAACTCC 
1151 CGGAAGATTG AATGATCTGC AAATGAGTAA CTTCGTCAAT CTGAAGAATA 
1201 TAACCTACTT GGTTTTAGAT GAAGCAGACA AGATGTTGGA CATGGGATTT 
1251 GAACCCCAGA TAATGAAGAT TTTGTTAGAT GTGCGCCCAG ATAGGCAGAC 
1301 AGTTATGACC AGTGCTACAT GGCCTCATTC AGTTCATCGC CTCGCACAAT 
1351 CTTATTTGAA AGAACCAATG ATTGTCTATG TTGGTACATT GGATCTAGTT 
1401 GCTGTAAGTT CAGTGAAGCA AAATATAATT GTAACCACCG AGGAAGAGAA 
1451 ATGGAGTCAC ATGCAAACTT TTCTACAGAG TATGTCATCC ACAGACAAAG 
1501 TCATTGTCTT CGTTTCTCGA AAAGCTGTTG CGGATCACTT ATCAAGTGAC 
1551 CTAATACTTG GAAATATATC AGTAGAGTCT CTGCATGGAG ATAGAGAACA 
1601 GAGAGATCGG GAGAAAGCAT TAGAGAACTT TAAAACAGGC AAAGTGAGAA 
1651 TACTAATTGC AACTGATCTA GCCTCTAGAG GACTTGATGT CCATGACGTT 
1701 ACACATGTCT ATAATTTTGA CTTTCCACGG AATATTGAAG AATACGTACA 
1751 CCGAATAGGG CGCACGGGAA GAGCAGGGAG GACTGGTGTT TCCATTACAA 
1801 CTTTGACTAG AAATGATTGG AGGGTTGCCT CTGAATTGAT TAATATTCTG 
1851 GAAAGAGCAA ATCAGAGTAT TCCAGAGGAG CTTGTATCAA TGGCTGAGAG 
1901 GTTTGAGGCA CATCAACGGA AAAGGGAAAT GGAAAGAAAA ATGGAAAGAC 
1951 CTCAAGGAAG GCCCAAGAAG TTTCATTAAT GTCTTCTGTA CTAGTGGGGT 
2001 AGAGAATTCA AGATTTTTTA GAAATATAGT AAGACAGAAG TATTGGACAT 
2051 GTTGGCAGTA TGAAGAGACC GGACTGATTT GACTGATTCT TAAAATAATA 
2101 GTGTTTGAAA ATATAGAATC CAGTGTTTTA TACTTTCTTT AATAAAAATA 
2151 GAAGTATTTA AACTTGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP GTP_A (286-294) 
DEAD_ATP_HELICASE (394-403) 



1 MSHHGGAPKA STWVVASRRS STVSRAPERR PAEELNRTGP EGYSVGRGGR 
51 WRGTSRPPEA VAAGHEELPL CFALKSHFVG AVIGRGGSKI KNIQSTTNTT 
101 IQIIQEQPES LVKIFGSKAM QTKAKAVIDN FVKKLEENYN SECGIDTAFQ 
151 PSVGKDGSTD NNWAGDRPL IDWDQIREEG LKWQKTKWAD LPPIKKNFYK 
201 EST AT SAMS K VEADSWRKEN FNITWDDLKD GEKRPIPNPT CTFDDAFQCY 
251 PEVMENIKKA GFQKPTPIQS QAWPIVLQGI DLIGVAQTGT GKTLCYLMPG 
301 FIHLVLQPSL KGQRNRPGML VLTPTRELAL QVEGECCKYS YKGLRSVCVY 
351 GGGNRDEQIE ELKKGVDIII ATPGRLNDLQ MSNFVNLKNI TYLVLDEADK 
401 MLDMGFEPQI MKILLDVRPD RQTVMTSATW PHSVHRLAQS YLKEPMIVYV 
451 GTLDLVAVSS VKQNIIVTTE EEKWSHMQTF LQSMSSTDKV IVFVSRKAVA 
501 DHLSSDLILG NISVESLHGD REQRDREKAL ENFKTGKVRI LIATDLASRG 
551 LDVHDVTHVY NFDFPRNIEE YVHRIGRTGR AGRTGVSITT LTRNDWRVAS 
601 ELINILERAN QSIPEELVSM AERFEAHQRK REMERKMERP QGRPKKFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14h21, frame 3 

TREMBL : CEY5 4 G 1 1 A_9 gene: "YS4G11A.3"; Caenorhabditis elegans cosmid 
Y54G11A, N = 1, Score = 1008, P = l.le-101 

TREMBL :SPBP8B7_1 6 gene: "dbp2"; "SPBP8B7 . 16c"; product: M p68-like 
protein."; S.pombe chromosome II pi p8B7., N = 1, Score - 971, p «= 
9.1e-98 

PIR:S13757 RNA helicase DBP2 - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 970, P = l,2e-97 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N = 1 , Score = 961, P - le-96 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score = 888, P - 7.8e-91 



>TREMBL:CEY54G11A_9 gene: "Y54G11A. 3" ; Caenorhabditis elegans cosmid 
Y54G11A 

Length = 504 

HSPs: 

Score = 1008 (151,2 bits), Expect = l.le-101, P = l.le-101 
Identities = 211/473 (44%), Positives * 298/473 (63%) 



PI ++ YK +S + + ++ 

-PIVRDLYKIPNEQKNLSPEQLQELYTNGGVMKVYPFREEST 75 



Query: 


174 


Sbjct: 


23 


Query: 


234 


Sbjct: 


76 


Query: 


294 


Sbjct: 


136 


Query: 


349 



IP P +F+ AF 



+M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 



-KGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVC 348 
+ Q+ P +LVL+PTRELA Q+EGE KYSY G +SVC 



L +L+P +H+ Q + 

LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 
34 9 VYGGGNRDEQIEELKKGVDI I IATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 
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+YGGG+R EQ+E + GV+I+IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYWLDEADRMLDMGFEV 255 

Query: 409 QIMKILLDVRPDRQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNIIVT 4 68 

I + IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: 256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 4 69 TEEEKW SHMQTFLQSMSSTD-KVIVFVSRKAVADHLSSDLILGNISVESLHGDREQR 524 

+ ++ + FL + + K+I+FV K +ADHLSSD + 1+ + LHG R Q 

Sbjct: 316 PHDSRFLRVCEIVNFLTAAHGQNYKKIIFVKSKVMADHLSSDFCMKGINSQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRILIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRT 584 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
Sbjct: 376 DREMSLNMLRSGEVQILVATDLASRGIDVPDITHVLNYDFPMDIEEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 644 

G +++ L ND LI ILE++ Q +P++L AE++ K + R RP R 

Sbjct: 436 GEAMSFLWWNDRSNFEGLIQILEKSEQEVPDQLRRDAEKYRL KCQSGRDGPRPSFRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 

Pedant information for DKFZphtes3_14h21, frame 3 



Report for DKFZphtes3_14h21 .3 

[ LENGTH ] 648' 

[MW] 72873.51 

[pi] 8.84 

[HOMOLJ TREMBL : CEY54G1 1A_9 gene: M Y54G11A. 3**; Caenorhabditis elegans cosmid Y54G11A le- 
101 

[FUNCAT] 04.01.04 rrna processing [S. cerevisiae, YNLll2w] 2e-97 

t FUNCAT j 30.10 nuclear organization [S. cerevisiae, YNL112wj 2e-97 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YPL119c] 4e-72 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 2e-70 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 
YOR204w] 2e-70 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YBR237w] le-61 

[FUNCAT] 1 genome replication, transcription, recombination and repair [H. 
influenzae, HX0892] 2e-49 

[FUNCAT] j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] le-48 

[ FUN CAT J 04.99 other transcription activities [S. cerevisiae, YDL160C] 9e-45 

[FUNCAT] 04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 3e-44 

[FUNCAT] 09.01 biogenesis of cell wall (S. cerevisiae, YJL033w] 2e-36 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YOR046c] 7e-32 

[ FUNCAT ] 30.16 mitochondrial organization [S. cerevisiae, YDR194c] 2e-28 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YGL064c] 5e-10 

[FUNCAT] 11.10 cell death [S. cerevisiae, YMR190c] 2e-08 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YMR190c] 2e-08 

[FUNCAT] r general function prediction [M. jannaschii, MJ1401] le-07 

[BLOCKS] BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039C DEAD- box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

[PIRKW] nucleus 4e-96 

[PIRKW] RNA binding 3e-87 

[PIRKW] DEAD box 5e-50 

[PIRKW] transmembrane protein 4e-27 

[PIRKW] DNA binding 3e-67 

[PIRKW] recF recombination pathway 3e-10 

[PIRKW] ATP 4e-96 

[PIRKW] purine nucleotide binding 5e-50 

[PIRKW] P-loop 4e-96 

[PIRKW] hydrolase 9e-45 

[PIRKW] protein biosynthesis 5e-50 

[PIRKW] ATP binding le-61 

[SUPFAM] WW repeat homology 8e-88 

[SUPFAM] DEAD/H box helicase homology 4e-96 

[SUPFAM] unassigned DEAD/H box helicases 7e-87 

[SUPFAM] ATP-dependent RNA helicase DBP1 4e-96 

[SUPFAM] ATP-dependent RNA helicase DHH1 2e-43 

[SUPFAM] recQ protein 3e-10 

[SUPFAM] Bloom's syndrome helicase 5e-07 

[SUPFAM] translation initiation factor eIF-4A 5e-50 

[SUPFAM] recQ helicase homology 3e-10 

[SUPFAM] tobacco ATP-dependent RNA helicase DB10 8e-88 

[PROSITE] DEAD_AT P_HEL I CAS E 1 
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t PROS ITEJ AT P_GT P_A 1 

IPFAM] Helicases conserved C-terminal domain 

( PFAM] KH domain family of RNA binding proteins 

IPFAM] DEAD and DEAH box helicases 

[KW] Alpha_Beta 

(KW) LOW_COMPLEXITY 8.49 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKKIQSTTNTTIQIIQEQPESLVKIFGSKAM 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVIDNFVKKLEENYNSECGIDTAFQPSVGKDGSTDNNVVAGDRPLIDWDQIREEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPIPNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CT FDDA FQC Y P E VMEN I KKAG FQK PT P I QS QAW P I V LQG I DL I G V AQT GTG KTLC Y LMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNIIVTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVIVFVSRKAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

SEQ LIATDLASRGLDVHDVTHVYNFDFPRNI EEYVHRIGRTGRAGRTGVSITTLTRNDWRVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcccccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINI LERANQS I PEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 



Prosite for DKFZphtes3_14h21 . 3 

PS00017 286->294 ATP_GTP_A PDOC00017 
PS00039 394->403 DEAD ATP HELICASE PDOC00039 



Pfam for DKFZphtes3_14h21 . 3 



HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 248 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM UPMLQHIDwdPWpqpPQd. . PrALILAPTRELAMQIQEEcRkFgkHMng 

L+P ++H+ +P +++ Q+ P +L+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-SLKGQRNRPGMLVLTPTRELALQVEGECCKYSYK-G- 343 

HMM IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleM 
+R++C+YGG N ++Q+++L++G+ +I+IATPGRL D+ +++ ++L++I++ 
Query 344 LRSVCVYGGGNRDEQIEELKKGV-DIIIATPGRLNDLQMSNFVNLKNITY 392 

HMM LVMDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARr 

LV+DEAD+MLDMGF++QI++I+ ++ +-+RQT+M SAT+P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR — PDRQTVMTSATWPHSVHRLAQS 440 
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HMM 
Query 



FMRNPIRInld . MdElTtnEnl kQwYiyVerEMWKf dcLcrLIe* 
++++P + ++ D +++ +KQ +1+ E++K + ++++ 
441 YLKEPMIVYVGTLDLVAVS-SVKQNIIVTT-EEEKWSHMQTFLQ 



482 



HMM_NAME 

HMM 

Query 

HMM 

Query 



KH domain family of RNA binding proteins 

*rIiIPedhMGMIIGKGGsNIRqIREEYgvrINIPdecCeDstdRIITIt 
+ + ++++G++IG+GGS I++I++ ++++I I++E+ + + + I 
71 ' CFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQ-P ESLVKIF 115 



G 

116 G 



116 



HMM_NAME Helicases conserved C-terrainal domain 

HMM *EileeWLknl GI r vmYI HGdMpQeERde I MddFNnGEynVLI cTD 

+ +++ L+ + +I+V ++HGD++Q++R++++++F++G+ ++LI+TD 
Query 497 KAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRILIATD 545 

HMM VggRGIDI PdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

+++RG+D+ DV HV+N+D+P+N+E Y++RIGRTGR+G 
Query 54 6 LASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAG 582 
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DKFZphtes3_14pl4 



group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3969 bp 

Poly A stretch at pos. 3948, polyadenylation signal at pos. 3927 



1 GAAGCCCAGG CTCTCCTTAG TTGACTGTGT GTTAATCACC CAGCAATTTC 
51 ATTACTCAAC AGCTCTCCAG AGTTGCACAT TACAGCTGGG GTAGAAATTG 
101 GGTGCTGAAG GCCAGGCAGA GCATTTGGCT GTAGGGAGGC CGATCCTCCT 
151 CGGGCCTGTT ACCGGCGGGT CTTTGTTCTT AGACCTGGGG TTCTTGGCCT 
201 CACGGATTCC AAGGAATGGA ACGTTGGGCC ATGCGTGTGA ACGAGCTCTA 
251 TGTCGATGAC CCAGACAAGG ACAGCGGTGG CAAGATCGAC GTCAGTCTGA 
301 ACATCAGTTT ACCCAATCTG CACTGCGAGT TGGTTGGGCT TGACATTCAG 
351 GATGAGATGG GCAGGCACGA AGTGGGCCAC ATCGACAACT CCATGAAGAT 
401 CCCGCTGAAC AATGGGGCAG GCTGCCGCTT CGAGGGGCAG TTCAGCATCA 
451 ACAAGGTATG GAAGCCCTGC CTCAGCCCTT TCTACCTGCT CCCCTTTCCT 
501 GCTGTCTCCC CGCTCCCTGG AAACTGGTTG TGGAGGCACT CACTCGACCT 
551 GACCCTGACA CAGCCCCCAG CAAGCGAGGG TTCGTGTCCA GCTGCCTGGC 
601 CGTTCCTGCT GAGAATCTGG ATGGGGGTCC AGGCTCCCTG GGGTTTTAAG 
651 CCCCTGATGG CTGGTTCAGG AAGGAGCTAC TCTTCTCTCC AGTGAGGGGG 
701 ACAATGATGA GAAGACCTGA GGATTTGCAG CCCCCAGCCC TGGGTTCAAG 
751 TCCCAGCTCT ACCCCTTCTT GGCCCCTACA AGTCACTTGA CCCATCTTAG 
801 GCTGAGGGTG TGATGGCGAT AATAGTATCA CGATACCACC CACTTCACAA 
851 AGTTTGTGTG GGGATTAAAT GAGCTAATGC AGATTCATTC ATTCAGAAAA 
901 ATTTTTGAAT GGCACGTTCT GTGTTCCAGG GTCGGTGATA GGCTCTGGGG 
951 CAGCGTTCCT GGGCTGGTGG GGCTCCCATT CTGGTAGAGG GAGACAGTCT 
1001 ACAAACCAGA AAGCATCAGG GATGCTAAGT GCAGTGATGA GGAATAAAGC 
1051 CAAGGGGAGT GAGATGAGGT GGGCTTGAAA GTACCTTGTC CGCTCAGAAG 
1101 GACCATTCAA GGTTCACTGT TGTTTTGTCC TCAGAACCAG GAGCTTCAGA 
1151 TCCTAAGTCA AGTGGGTGAA CGCAGTGCCC TTGGGAGGGC CGAGGCACCC 
1201 GGTGGCAGCT GGCAGGGTTT TGCTCAGCAC GTGCCGGCCT TCCTCGAAGC 
1251 TCGGTACTGT CACAGTGGAG CCTCTCAACA ACGCTGTGAG GCAGCACCAT 
1301 TTGACAGGTT AGGATGCTGG GGCCCAGAGA GGTTAAGTGT CTTGCCCGAG 
1351 GTCACACAGC TATCTGCATG TCCCACAACT CCCCTTCCCA GCCCCAGCCA 
1401 AACTGAGCCA CTGGCCACTC CTGGCTTCTC CTTGTCCCTC CTGCAGCCTC 
14 51 TGCTCAGAAC GCCCTTCCTC CAGACCCTGA CACCTGAGCT GGGGTTGCAA 
1501 AGTCACTGGC CACATCCAGC CCAAAGATAA ATTTTGTTTG TCCAGTATAG 
1551 CATTTAACTG CATCAGAACC AGTATGAAAA GACCAGGAAT CCAGATTTCT 
1601 GGCTTTTAAA AGTCAGAGGC TCTCACTACA CTGGGTCCGT GTTCCCGCTA 
1651 TGACAATGAC CTGGCACCAA TGGGCAGTGT TCCCCTTTAG AGAGGGTGTG 
1701 TGCTGTCCCT TCCCACAGTC CCTGGCAGGC GGCTGGAAGG CCAGGCCTGG 
1751 TCATCTGTCA AGCAGGGTGG ACTTCTTACG TGACAGTTCA GGGCTCCCTT 
1801 AAGTGCTAAA GCAGAAGCTG CAAGGCTTTC TTAAGGTTTC GAGTGTTGCT 
1851 GGGAGAAATC TGCTGCATGT TGTGGGTTAA AGGGAGTCTC TCACCAGCCC 
1901 AGGCCCTCAq GAGGAGGAGA TACCAGGAGG CAGGGATGCT GGGGGTCGTG 
1951 GTTCACTGGG GGCTCTCTCT GCCCATGAGC TGCCACACAG CACCTTTGCC 
2001 ATGCCCCGTA ATTTGGATTT TATGGTGGTT GTGATGGAAA GCCATTTGAG 
2051 GGTTTTGAAC AGGGAGGCAA TGTAATCAGA TTTATGCCTT AGAACTGGAC 
2101 TATCCAATAG GTTGCCACCA GCCACATAAG GCTATTTAAA TTAATTCAAA 
2151 TTAAATGTAC AATTCAGTCA CTCATTCTCA TCAACCACAT TTCAAGTGCT 
2201 CAAAGCCACG TGCTGGCTAG GGGCCACAGC GTTAGACAGT GCAGAGAGAA 
2251 AGCACTTCCA TCGCTGAGGA AAGTTCTGCT GGACCGCACA CCCTTAGAAG 
2301 GATGGCTCTG GTGGCCGGGC GCGGTGGCTC AAACCTGTAA TCCCAGCACT 
2351 TTGGGAGGCC GAGGTGGGTG GATCACGAGG TCAGGAGATC GAGACCATCC 
2401 CGGCTAACAT GGTGAAACCC TGCCTCTACT AAAAATACAA AAAAAAACAA 
2451 AATTAGCCGG GCGTGGTTGC GGGCACCTGT AGTCCCAGCT ACTCAGGAGG 
2501 CTGAGGCGGG AGAATGGCAT GAACCCGGGA GGTGGAGCTT GCAGTGAGCC 
2551 AAGATCGTAC CACTGCACTC CAGTCTGGGC GACAGAGTGA GACTCCATCT 
2601 CAAAACAAAC AAAAAAAGGA TGGGGCTGGG CTGGAGAGGG TGGCAGGCAG 
2651 TGGTTGTGGC AGTGGAGCTG GGGAGATGTG GTCGGATTAG GGAGGTAGAA 
2701 TCAATAAGAC TCAGTGAAGA ATCGGATGTG GGGGTAAGGG CACATGTGGA 
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2751 AGCAAAGAAA CCTTTGACGT CTTTGTCTTG ACAACCGGGT GGTCCTGTTT 
2801 CTAGACATGG AAGCTTAGAA AAGCCTGGAG TCTGTGGGAA GTAGGTAGGG 
2851 CTGGGCACTG GTCATTCCAC TCTGGTTTCC TTTGGGGTTC CCATTAGGTG 
2901 TCTACAGGGA GAGGTGAAAT TGGAAGTTGG AGGTGTGGAG AGTTCAGGAG 
2951 AGGGTTCTGG ACCACAGATG TTGAGGTGGG AGTCATTAGT GAATAGATGA 
3001 TGTTGGAAGT CATGGGTCCT CAGAGTGGGG GCTCCTTAAG CCTCCAGGCC 
3051 AGCAGCATCA GCATCACCTG GGAGATTGTT AGGAATGCAG ATTCTCAGGC 
3101 CCCCCTAAGA CCCACCGACT CTGTGCTAGA ACAAGCGCCC CTCAGAGATT 
3151 CTGATGCCAC TGAAGTTTGA GGAGCATTGG TTTAAGCAAG ATTACCTACG 
3201 GAGAGGCTGT AGATCCGTGT TCTAAACCTG GGGTCCACAG ACACCCCCAA 
3251 GAAGAGCGGA TTGAATGCAA GAGATCTATG AAGTTGGATG GGGGAAAAAT 
3301 TGACATCTTT ATTTTTGCTA AACTCGATCT AAAGTTTAGC ATTTCCATCT 
3351 GCGATGAATG TAGGCCACAA ACCACAGTAG TATTAGCAGT GCCTGGGACC 
3401 TCCTCAACAA CAGAAATTGC CGGTATTTAT AGCACGTTAC AGTTGTTGCA 
3451 GATAATTTCC AGAGACTGTT TATATGCACC ACTGTTTTAA AATTACGGTG 
3501 ATTGGCCAGG TGCAGTGGCT CACACCTGTA ATCCCAGCAC TTTGGGAGGC 
3551 CAAAGTGGGT GGATCACTTG AGGAGTTCAA GACCAGCCTG GTCAACATGT 
3601 CAAAACCCTG TAT CT AC AAA AAAATACAAA AGTTAACCAA GCCTATGCTT 
3651 GTAGTCACAG CTACTCGGGA GGCCGAGGTG GGAGGGTCTT CTGAGCCCAG 
3701 GGAGGTAGAG GCTTCAGTGA GCTGAGATCG CACCACCACA CTCCAGCCTG 
3751 GGTGACAGAG TGAAACCCTT AATCAATCAG TCAATAAAAA TTACAGTAAT 
3801 TATTAGACCC ACCACTAGGT CATCTTATTT GATGCATCAG TAAAGCAGCA 
3851 TATTCAAATG TGGATTTTTA AATATTTTAA TTACTATTTA AATATCTCTT 
3901 TACTTTGTAA TCCTATGCAT TTTACGCATT AAAACATTTT AAGCATTTAA 
3951 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 



1 MERWAMRVNE LYVDDPDKDS GGKIDVSLNI SLPNLHCELV GLDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRHSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14pl4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_14pl4 / frame 3 



Report for DKFZphtes3_14pl4 .3 



{ LENGTH ) 159 

[MW] 17778.55 

Ipl] 5.74 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YAL042w] 5e-04 

[KW] Alphabet a 



SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVGLDIQDEMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KI PLNNGAGCRFEGQFSINKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ EGSCPAAWPFLLRIWMGVQAPWGFKPLMAGSGRSYSSLQ 
PRD ccccccchhhhhhhhhhhccccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_14pl4 . 3) 
(No Pfam data available for DKFZphtes3_14pl4 .3) 
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DKFZphtes3_14p7 



group: testes derived 

DKFZphtes3_14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated protein KAP3. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



weak similarity to kinesin associated protein KAP3 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2497 bp 

Poly A stretch at pos. 2424, polyadenylation signal at pos. 2400 



1 GGAATCCAAA GAAACAGTTA TGATGGGGGA CTCTATGGTG AAAATAAATG 
51 GGATTTATTT AACAAAATCA AATGCTATTT GCCACTTAAA GAGTCACCCA 
101 CTTCAGCTAA CTGATGATGG AGGCTTCAGT GAAATAAAGG AGCAAGAAAT 
151 GTTCAAAGGA ACAACATCTT TACCATCTCA TCTCAAGAAT GGAGGGGACC 
201 AGGGGAAGAG ACATGCGAGG GCCTCATCAT GCCCCAGTAG CTCAGACCTG 
251 AGCAGGCTGC AAACCAAAGC AGTCCCAAAA GCTGACCTGC AAGAAGAGGA 
301 CGCAGAAATA GAAGTAGACG AAGTCTTTTG GAATACAAGG ATTGTACCGA 
351 TTTTGCGTGA ATTAGAAAAG GAAGAAAACA TTGAAACGGT TTGTGCTGCT 
401 TGCACACAAC TTCATCATGC TTTAGAGGAA GGAAACATGC TTGGAAATAA 
451 ATTTAAGGGA AGAAGTATTC TCCTGAAGAC CCTGTGTAAA CTAGTTGATG 
501 TTGGTTCAGA CTCGCTCAGC CTTAAACTTG CAAAAATAAT TCTAGCACTT 
551 AAAGTGAGTA GAAAGAATCT TCTTAATGTC TGCAAACTTA TATTTAAAAT 
601 TAGCAGGAAT GAGAAGAATG ATTCTTTGAT TCAAAATGAC AGCATTCTGG 
651 AATCATTATT GGAGGTACTA AGAAGTGAAG ACCTGCAAAC TAACATGGAA 
701 GCTTTTTTAT ACTGTATGGG GTCTATAAAG TTCATTTCTG GAAATCTGGG 
751 ATTTCTTAAT GAAATGATCA GCAAAGGTGC TGTGGAAATA CTGATAAATT 
801 TGATAAAACA AATAAATGAG AACATCAAGA AATGTGGTAC ATTTTTGCCT 
851 AATTCGGGCC ACTTGCTAGT CCAGGTGACT GCTACATTGA GAAACTTGGT 
901 TGATTCATCA TTAGTAAGAA GTAAGTTCCT AAACATCAGT GCCCTTCCCC 
951 AGCTCTGCAC GGCAATGGAA CAGTACAAGG GTGACAAGGA CGTCTGTACC 
1001 AATATTGCCA GAATATTCAG CAAACTTACT TCTTACCGTG ACTGCTGCAC 
1051 AGCCTTGGCC AGCTATTCCA GATGTTATGC CTTATTTCTG AATCTAATTA 
1101 ACAAATACCA GAAGAAGCAG GATTTAGTCG TCCGTGTTGT TTTTATTCTT 
1151 GGCAACCTGA CGGCAAAAAA TAACCAGGCT CGTGAACAAT TTTCCAAAGA 
1201 GAAAGGGAGC ATCCAAACTC TGCTGTCATT ATTCCAGACG TTCCATCAGC 
1251 TGGATCTGCA TTCCCAGAAG CCGGTGGGCC AACGAGGCGA GCAGCACAGG 
1301 GCGCAGAGGC CGCCGTCAGA GGCAGAGGAC GTGCTCATCA AGCTGACTCG 
1351 TGTGCTGGCC AACATTGCCA TCCACCCGGG CGTGGGCCCG GTGCTGGCCG 
1401 CCAACCCGGG GATAGTGGGC CTGCTCCTGA CCACGCTGGA ATACAAGTCA 
1451 CTTGATGATT GTGAGGAGCT GGTGATCAAT GCTACAGCGA CAATCAACAA 
1501 TTTATCTTAC TACCAAGTGA AGAATTCCAT AATTCAAGAC AAAAAGCTAT 
1551 ATATTGCTGA ATTGCTCTTA AAGCTTCTTG TCAGTAACAA CATGGATGGA 
1601 ATCCTGGAGG CTGTGCGTGT TTTCGGAAAT CTCTCCCAGG ACCATGATGT 
1651 CTGCGATTTC ATTGTGCAGA ACAATGTCCA CAGGTTCATG ATGGCGCTGC 
1701 TGGATGCTCA GCATCAGGAT ATCTGCTTTT CTGCCTGTGG TGTTCTCCTC 
1751 AATCTCACTG TGGATAAAGA CAAGCGTGTC ATCTTGAAAG AAGGAGGTGG 
1801 CATTAAAAAG TTAGTGGACT GTTTAAGAGA TTTGGGTCCT ACTGATTGGC 
1851 AGCTGGCCTG CTTGGTTTGT AAAACTTTAT GGAACTTCAG TGAAAACATC 
1901 ACTAATGCTT CGTCATGTTT TGGAAATGAA GACACCAACA CACTCTTACT 
1951 CTTGCTCTCA TCATTTTTAG ATGAAGAACT AGCACTGGAT GGCAGTTTTG 
2001 ATCCAGACCT AAAAAACTAT CACAAACTCC ATTGGGAAAC AGAATTCAAA 
2051 CCTGTGGCAC AGCAGCTTCT AAACCGAATT CAGAGACATC ACACCTTCCT 
2101 GGAACCCCTG CCCATTCCCT CTTTCTAACA TGATGCAGAT TAACAGTAGA 
2151 AACGAGAACT CACGTCTCCC TCATTCTTAA GAACTGGTAA CAAACGTGAA 
2201 CATTTTTTTC AGCATTAACA AATGTGGAAA GTTTTTCAAG AACTGGTTTT 
2251 AGTGAGTAGC TGAAGTATTT TTTAAAATTA AGCATTTCTT CTTGTTAGGT 
2301 ATTATGGAAA AATGAATATA CACATTATAT TTCCTGTTGA GAGAAATGTA 
2351 AGATGAAAAT ATGTGCATTT TCAAGTAAAT GACTTTTTCT TCTATTCTCT 
2401 ATTAAACAAT TTAGTTCTAG TCTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
2451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 
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No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 



1 MMGDSMVKIN 
51 LPSHLKNGGD 
101 EVFWNTRIVP 
151 LLKTLCKLVD 
201 DSLIQNDSIL 
251 SKGAVEILIN 
301 SKFLNISALP 
351 RCYALFLNLI 
401 LLSLFQTFHQ 
451 IHPGVGPVLA 
501 KNSIIQDKKL 
551 NNVHRFMMAL 
601 CLRDLGPTDW 
651 DEELALDGSF 
701 SF 



GIYLTKSNAI 
QGKRHARASS 
ILRELEKEEN 
VGSDSLSLKL 
ESLLEVLRSE 
LIKQINENIK 
QLCTAMEQYK 
NKYQKKQDLV 
LDLHSQKPVG 
ANPGIVGLLL 
YIAELLLKLL 
LDAQHQDICF 
QLACLVCKTL 
DPDLKNYHKL 



CHLKSHPLQL 
CPS5SDLSRL 
IETVCAACTQ 
AKIILALKVS 
DLQTNMEAFL 
KCGTFLPNSG 
GDKDVCTNIA 
VRVVFI LGNL 
QRGEQHRAQR 
TTLEYKSLDD 
VSNNMDGILE 
SACGVLLNLT 
WNFSENITNA 
HWETEFKPVA 



TDDGGFSEIK 
QTKAVPKADL 
LHHALEEGNM 
RKNLLNVCKL 
YCMGSIKFIS 
HLLVQVTATL 
RIFSKLTSYR 
TAKNNQAREQ 
PPSEAEDVLI 
CEELVINATA 
AVRVFGNLSQ 
VDKDKRVILK 
SSCFGNEDTN 
QQLLNRIQRH 



EQEMFKGTTS 
QEEDAEIEVD 
LGNKFKGRSI 
IFKISRNEKN 
GNLGFLNEMI 
RNLVDSSLVR 
DCCTALASYS 
FSKEKGSIQT 
KLTRVLANIA 
TINNLSYYQV 
DHDVCDFIVQ 
EGGGI KKLVD 
TLLLLLSSFL 
HTFLEPLPIP 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14p7, frame 2 

TREMBL : MMD3 67_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, 
complete cds., N « 2, Score « 97, P = 0.00039 



>TREMBL:MMD367_1 product: "KAP3B" ; Mus musculus mRNA for KAP3B, complete 
cds. 

Length = 772 

HSPs: 

Score « 97 (14.6 bits), Expect = 3.9e-04, Sum P(2) - 3.9e-04 
Identities = 45/163 (27%), Positives = 77/163 (47%) 

LTRVLANIAIHPGVGPVLAANPGIVGLLLTTLEYKSLDDCEELVINATATINNLSYYQVK 501 
L +++ NI+ H G P VG L + S D+ EE VI T+ NL+ + 
LMKMIRNI SQHDG — PTKNLFI DYVGDLAAQI SSDEEEEFVIECLGTLANLTIPDLD 537 



Query: 


442 


Sbjct: 


483 


Query: 


502 


Sbjct: 


538 


Query: 


560 


Sbjct: 


597 


Score 


= 77 


Identities = 


Query: 


169 


Sbjct: 


263 


Query: 


228 


Sbjct: 


319 


Query: 


288 


Sbjct: 


367 



++++ KL + 



KL D +LE V + G +S D 



LL+AQ +D F C++ 



-NLTVDKDKR-VILKEGGGIKKLVDCLRD 604 
+ + R VI+KE L+D + D 



LI. 6 bits), Expect =* 3.9e-04, Sum P(2) ° 
42/178 (23%) , Positives = 82/178 (46%) 



3.9e-04 



K K 



L V ++ LL V 



L+ ++ + + + ++N +1+ L++ L 



+ + +K +S + N+M+ 



VE L+ +1 



+E++ 



N E 



L + + 



D+ L R+K + + LP+L + E YK 



+C 



+1 + 



F + +Y D 



574 



WO 01/12659 



PCT/IB00/01496 



Query: 


342 


CCTAL 34 6 
C L 




Sbjct: 


425 


CIPQL 429 




Score 


- 69 


(iu.4 bits) , Expect ° 2.oe+uo, sum P(^) ° 9.ze-oi 




Identities ' 


= 35/146 <23%) # Positives - 70/146 (47%) 




Query: 


512 


IAELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFS 


571 






I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+D+ 




Sbjct: 


304 


I VHMLVKALDRDNFELLI LVVSFLKKLSI FMENKNDMVEMDI VEKLVKMI PCEHEDLLNI 


363 


Query: 


572 


ACGVLLNLTVDKDKRVILKEGGGIKKLVDCLRDLGPTDW-QLACLVCKTLWNFSENITNA 


630 






+LLNL+ D R++G+KL L G++ Q+A +C L++ S + 




Sbjct: 


364 


TLRLLLNLSFDTGLRNKMVQVGLLPKLTALL GNENYKQIA—MC-VLYHISMD-DRF 


416 


Query: 


631 


SSCFGNEDT-NTLLLLLSSFLDEELALD 657 








S F D L+ +L DE + L+ 




Sbjct: 


417 


KSMFAYTDCIPQLMKMLFECSDERIDLE 4 44 




Score 


= 68 


(10.2 bits), Expect = 3.2e-03, Sum P(2) - 3.2e-03 





Identities = 18/58 (31%), Positives = 30/58 (51%) 

Query: 190 LIFKISRNEKN-DSLIQNDSILESLLEVLRSE DLQTNMEAFLYCMGSIKFISG 241 

LI +++RN N + L+ N++ L +L VLR + +L TN+ +C S G 

Sbjct: 155 LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHG 212 

Score = 65 (9.8 bits), Expect « 6.4e+00, Sum P(2) = 1.0e+00 
Identities = 26/122 (21%), Positives = 53/122 (43%) 



Query: 


283 


LVQVTATLRNL VDSSLVRSKFLNI SALPQLCTAMEQYKGDKDVCTNIARI FSKLTS 


J JO 






+++ TL NL +D LV ++ +P L ++ + D+ + I S 




Sbjct: 


521 


VIECLGTLANLTIPDLDWELVLKEY KLVPFLKDKLKPGAAEDDLVLEVV-IMIGTVS 


576 


Query: 


339 


YRDCCTALASYSRCYALFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSI 


398 






D C AL + S + L+N Q+ + V +++++ + + R+ KE + 




Sbjct: 


577 


MDDSCAALLAKSGIIPALIELLNAQQEDDEFVCQIIYVFYQMVF-HQATRDVIIKETQAP 


635 


Query: 


399 


QTLLSL 404 








L+ L 




Sbjct: 


636 


AYLIDL 641 




Score 


= 65 


(9.8 bits), Expect = 6.4e+00, Sum P(2) = 1.0e+00 




Identities = 44/177 (24%), Positives - 79/177 (44%) 




Query: 


481 


CE-ELVINATATIN-NLSYYQ-VKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 


537 






CE E ++N T + NLS + ++N ++Q ++ LLL+N IA+V+ 




Sbjct: 


355 


CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGNEN YKQI - - AMCVL YH 


409 


Query: 


538 


LSQDHDVCD-FIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGGIK 


596 






+SD F +++ML+ +1 +NL +K ++ EG G+K 




Sbjct: 


410 


ISMDDRFKSMFAYTDCIPQLMKMLFECSDERIDLELISFCINLAANKRNVQLICEGNGLK 


469 


Query: 


597 


KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEELAL 


656 






L+ R L D L+ K + N S++ + F + L +SS +EE + 




Sbjct: 


470 


MLMK- -RALKLKD PLLMKMIRNISQHDGPTKNLF-IDYVGDLAAQISSDEEEEFVI 


522 


Query: 


657 


D 657 
+ 




Sbjct: 


523 


E 523 




Score 


« 61 


(9.2 bits), Expect - 1.6e-02, Sum P(2) » 1.6e-02 




Identities = 20/66 (30%), Positives = 34/66 (51%) 




Query: 


304 


LNISALPQLCTAM-EQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYALFLNLINK 


362 






LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N+I + 




Sbjct: 


171 


LNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 


229 


Query: 


363 


YQKKQDL 369 








K+ +L 




Sbjct: 


230 


ELKRHEL 236 





Pedant information for DKFZphtes3_14p7, frame 2 



Report for DKFZphtes3_14p7 .2 



[LENGTH] 708 

[MW] 79266.35 

[pi] 6.57 
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[FUNCAT] 30.25 vacuolar and lysosomal organization (S. cerevisiae, YEL013w] 3e-04 

[FUNCAT] 06.04 protein targeting, sorting and translocation (S. cerevisiae, YEL013wJ 

3e-04 

[FUNCAT J 09.25 vacuolar and lysosomal biogenesis IS. cerevisiae, YEL013w] 3e-04 

[BLOCKS] BL00923F Aspartate and glutamate racemases proteins 

[BLOCKS] BL0028BB Tissue inhibitors of metalloproteinases proteins 

[PROSITE] MYRISTYL 9 

[PROSITE] AM I DAT I ON 1 

[PROSITE] CK2_PHOSPHO_SITE 12 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 11 

[KW] Alpha_Beta 

[KW] t LOW_COMPLEXITY 7.49 % 

SEQ ESKETVMMGDSMVKINGI YLTKSNAICHLKSHPLQLTDDGGFSEIKEQEMFKGTTSLPSH 

SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMLGNKFKGRSILLKTLCKLVDVGSDSLSLKLAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVTATLRNLV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQKPVGQRGEQHRAQRPPSEAEDVLIKLTRVLANIAIHPGVGPVLAANPGIVGLLLTTLE 

SEG 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ IKKLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPIPSF 

SEG xxx 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 



Prosite for DKFZphtes3_14p7.2 



PS00001 


206- 


>210 


ASN 


G LYCOS Y L AT I ON 


PDOC00001 


PS00001 


212- 


■>216 


ASN" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


311- 


■>315 


asn" 


"glycosylation 


PDOC00001 


PS00001 


385->389 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


493->497 


asn" 


"glycosylation 


PDOC00001 


PS00001 


500- 


■>504 


asn" 


"glycosylation 


PDOC00001 


PS00001 


543->547 


asn" 


"glycosylation 


PDOC00001 


PS00001 


584- 


->588 


asn" 


"glycosylation 


PDOC00001 


PS00001 


628- 


->632 


asn" 


"glycosylation 


PDOC00001 


PS00001 


632- 


■>636 


asn" 


"glycosylation 


PDOC00001 


PS00001 


635- 


•>639 


asn" 


"glycosylation 


PDOC00001 


PS00005 


173- 


■>176 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


186- 


•>189 


PKC* 


"PHOSPHO~SITE 


PDOC00005 


PS00005 


241- 


•>244 


PKC" 


"PHOSPHO SITE 


PDOC00005 
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PS00005 

r *J W w V V -J 


295->298 


PKC PHOSPHO SITE 


PDOC00005 


PS0OOO5 

± \J \J \J \J wj 


344->347 


PKC~PHOSPHO~SITE 


PDOC00005 


PS0O0O5 


387->390 


PKC PHOSPHO SITE 


PDOC00005 


PS00OO5 

r w w w w v j 


421->424 


PKC~PHOSPHO~SITE 


PDOC00005 


PSQ00O6 

r w v v w w v 


79_>93 


CK2 PHOSPHO SITE 


PDOC00006 


PS000O6 

t ^ V w V W V 


201->205 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


214->218 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


218->222 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS0OOO6 


230->234 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


320->324 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 

* *J V V V \J \J 


344->348 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 
r «s v w v w 


439->443 


CK2 PHOSPHO SITE 


PDOC00006 


pqOOfinfi 

r «J v uuuo 


477->481 


CK2 PH0SPH0"SITE 


PDOC00006 


l O V w Is U w 


483->487 


CK2 PHOSPHO 


site 


PDOC00006 


t *J V \J V v W 


U*! 1 ! w J W 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


698->702 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00008 


17->23 


MYRISTYL 




PDOC00008 


PS00008 


64->70 


MYRISTYL 




PDOC00008 


PS00008 


144->150 


MYRISTYL 




PDOC00008 


PS00008 


384->390 


MYRISTYL 




PDOC00008 


PS00008 


402->408 


MYRISTYL 




PDOC00008 


PS00008 


473->479 


MYRISTYL 




PDOC00008 


PS00008 


533->539 


MYRISTYL 




PDOC00008 


PS00008 


580->586 


MYRISTYL 




PDOC00008 


PS00008 


641->647 


MYRISTYL 




PDOC00008 


PS00009 


67->71 


AMIDATION 




PDOC00009 



(No Pfatn data available for DKFZphtes3_14p7 .2) 
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DKFZphtes3_15al3 



group: testes derived 

DKFZphtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-specific protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos. 1766, no polyadenylation signal found 



1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG TGCAGCCTCG TGCAGCTCTT 
51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA ATGTTGAATT AAAGAAAATA 
101 CTTTATCAGA AGAAGATGGC CACTGCCCAG TTGCAGAGGA CTCCCATGAG 
151 TGCACTGGTA TTTCCCAATA AGATATCAAC TGAACACCAG TCTTTGGTGT 
201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT CCTGTATCAC GTATTTGAGG 
251 GGAATATTCC CAGAATGCGC TTATGGAACA AGATATCTAG ATGATCTTTG 
301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG CCCAGGATCT ACACAGTTAG 
351 TGAAATGGAT GCTAGGATGT TATGATGCTT TACAGAAAAA ATATGTATAC 
401 ACAAACCCAG AAGATCCTCA GACAATTTCA GAATGTTACC AATTCAAATT 
451 CAAATACACC AATAATGGAC CACTCATGGA CTTCATAAGT AAAAACCAAA 
501 GCAACGAATC TAGCATGTTG TCTACTGACA CCAAGAAAGC AAGCATTCTC 
551 CTCATTCGCA AGATTTATAT CCTAATGCAA AATCTGGGGC CTTTACCTAA 
601 TGATGTTTGT 'TTGACCATGA AACTTTTTTA CTATGATGAA GTTACACCCC 
651 C AG ATT AC C A GCCTCCCGGT TTTAAGGATG GTGATTGTGA AGGAGTTATA 
701 TTTGAAGGGG AACCTATGTA TTTAAATGTG GGAGAAGTCT CAACACCTTT 
751 TCACATCTTC AAAGTAAAAG TGACCACTGA GAGAGAACGA ATGGAAAATA 
801 TTGACTCAAC TATACTATCA CCAAAACAAA TAAAAACACC ATTTCAAAAA 
851 ATCCTGAGGG ACAAAGATGT AGAAGATGAA CAGGAGCATT ATACAAGTGA 
901 TGATTTGGAC ATTGAAACTA AAATGGAAGA ACAGGAAAAA AACCCTGCAT 
951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT GTGAGGAAGA TGAAATTATG 
1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT TCTCATTCTC AGGTTGAGCA 
1051 GTTAGTCAAT AAAACATCTG AACTTGATAT GTCTGAAAGC AAAACAAGAA 
1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA ATGGAAATCA ACCAGTAAAA 
1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA CATGAATCTG GGAGAATAGT 
1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA GTCAGTGCCA AAAAGGAGAA 
1251 AGTTTAGTGA ACCAAAGGAA CATATATAAA AATTATTTTT GTTCTGCAGG 
1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG AAGGACCCTA TAT TAT AT TT 
1351 CCCTAACTCT GAAGATGTAT ATGTAGTTTA AAGCAGTTTG TACACTAAAA 
1401 CTAAGTTTTT GGCTGACTGT CATATTGTGG TCCTTAATCT TGAGATAAAT 
14 51 CCAATAGAAC TTTTGAATAA AAGCAAAAGT ACAAATGTCA TAATTGATTC 
1501 GGTAATAAGT AAAATT TCAA AATTGATTTT GTTCATTACC TACTTAATAT 
1551 TTCCTTTAAA TATATACTAA CTGTTAAGGC CCTCTAATGC CATTTTTCTA 
1601 AACAGTAATG TTTACTTTGG T ATT AAAATT TGGTATGGAT TCACTTTTTA 
1651 CTTATGTTAA AATTATACCA TTTAACTGGC TCTTTTGTCA TTGTGCTGTT 
1701 ATTAAAACAA TGTTCTTCAA TATTTTGACA TAATGTATTA ACATTTTAAT 
1751 ATATAATGTA CAATTTAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 
1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACAAAAAAA AAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 



1 MATAQLQRTP MSALVFPNKI STEHQSLVLV KRLLAVSVSC ITYLRGIFPE 
51 CAYGTRYLDD LCVKILREDK NCPGSTQLVK WMLGCYDALQ KKYVYTNPED 
101 PQTISECYQF KFKYTNNGPL MDFISKNQSN ESSMLSTDTK KASILLIRKI 
151 YILMQNLGPL PNDVCLTMKL FYYDEVTPPD YQPPGFKDGD CEGVIFEGEP 
201 MYLNVGEVST PFHIFKVKVT TERERMENID STILSPKQIK TPFQKILRDK 
251 DVEDEQEHYT SDDLDIETKM EEQEKNPASS ELEEPSLVCE EDEIMRSKES 
301 PDLSISHSQV EQLVNKTSEL DMSESKTRSG KVFQNKMANG NQPVKSSKEN 
351 RKRSQHESGR IVLHHFDSSS QESVPKRRKF SEPKEHI 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15al3, frame 2 

TREMBL: ATAC2 13 0_3 product: "F1N21.3"; The sequence of BAG F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N = 1, Score * 
274, P = 5.7e-22 

TREMBL :SC9877_9 gene: "hopl w ; S.cerevisiae chromosome IX cosmid 9877., 
N = 2, Score = 126, P = 7.1e-09 

PIR:A34691 meiosis-specif ic protein HOP1 - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 126, P = 7.8e-08 



>TREMBL:ATAC2130_3 product: "Fl^l.S"; The sequence of BAC F1N21 from 
Arabidopsis~thaliana chromosome 1, complete sequence. 
Length =562 

HSPs: 

Score = 274 (41.1 bits), Expect = 5.7e-22, P = 5.7e-22 
Identities = 8.4/290 (28%), Positives = 145/290 (50%) 

Query: 22 TEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDDLCVKILREDKNCPGSTQLVKW 81 

TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + S +L+ W 

Sbjct: 11 TEQDSLLLTRNLLRIAIFNISYIRGLFPEKYFNDKSVPALDMKIKKLMPMDAESRRLIDW 70 

Query: 82 M-LGCYDALQKKYVYT NPEDPQTISECYQFKFKYTNNGP — LMDFISK — NQSN 130 

M G YDALQ+KY+ T D I E Y F F Y+++ +M I++ N+ N 

Sbjct: 71 MEKGVYDALQRKYLKTLMFSICETVDGPMIEE-YSFSFSYSDSDSQDVMMNINRTGNKKN 12 9 

Query: 131 ESSMLST DTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPDYQPP 184 

ST + ++ ++R + LM+ L +P++ + MKL YYD+VTPPDY+PP 

Sbjct: 130 GGIFNSTADITPNQMRSSACKMVRTLVQLMRTLDKMPDERTIVMKLLYYDDVTPPDYEPP 189 

Query: 185 GFKD--GDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTT ERERMENIDSTILS 235 

F+ 0 ++ P+ + +G V++ + +KV + E + M++ D + 

Sbjct: 190 FFRGCTEDEAQYVWTKNPLRMEIGNVNSKHLVLTLKVKSVLDPCEDENDDMQD-DGKSIG 248 

Query: 236 PKQIKTPFQKILRDKDVEDEQEHY TSDDLDIETKMEEQEKN PASSE 281 

P + Q D ++ QE+ DD D E ++ ++PA +E 

Sbjct: 249 PDSVHDD-QPSDSDSEISQTQENQFIVAPVEKQDDDDGEVDEDDNTQDPAENE 300 



Pedant information for DKFZphtes3_15al3, frame 2 



Report for DKFZphtes3_15al3 .2 



[LENGTH] 387 

[MW] 44417,64 

[pi] 5.57 

[HOMOL] TREMBL :ATAC2130_3 product: w FlN21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 9e-23 

IFUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072w] 7e-ll 

f FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w) 7e-ll 

IFUNCAT] 30.10 nuclear organization [S. cerevisiae, YIL072w) 7e-ll 

(PIRKW] nucleus 2e-09 

[PIRKW] zinc finger 2e-09 
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[PIRKW] DNA binding 2e-09 

[PROSITE] MYRISTYL 1 

[PROSITE) CAMP_PH0SPHO SITE 3 

t PROSITE J CK2 PHOSPHO_SITE 12 

t PROSITE] PKC~PHOSPHO_SITE 7 

[PROSITE) ASN GLYCOSYLATION 3 

[KWJ Alpha_Beta 



SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDIETKMEEQEKNPASSELEEPSLVCEEDEIMRSKES 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ PDLSISHSQVEQLVNKTSELDMSESKTRSGKVFQNKMANGNQPVKSSKENRKRSQHESGR 

PRD ccccccchhhhhhhhhhcccccccccccccceeeeeccccccccchhhhhhhhhhcccce 

SEQ IVLHHFDSSSQESVPKRRKFSEPKEHI 

PRD eeeeecccccccccccccccccccccc 



Prosite for DKF2phtea3_15al3 .2 



PS00001 


127- 


->131 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


130- 


->134 


AS N~GL YCOS Y L AT I ON 


PDOC00001 


PS00001 


315- 


•>319 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


140- 


->144 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


351- 


->355 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


378- 


->382 


CAMP"PHOSPHO~SITE 


PDOC00004 


PS00005 


139- 


->142 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


167- 


->170 


PKC~PHOS PHO~S I TE 


PDOC00005 


PS00005 


221- 


->224 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


235- 


->238 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


329- 


->332 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


346- 


->349 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


358- 


->361 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


96- 


->100 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


103- 


->107 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


177- 


->181 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


221- 


->225 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


260- 


->264 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


268- 


->272 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O0O6 


280- 


->284 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


308- 


->312 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


318->322 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


346- 


->350 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


354->358 


CK2 PH0SPH0"SITE 


PDOC00006 


PS00006 


369- 


->373 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


84->90 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_15al3 . 2) 
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group: metabolism 

DKF2phtes3 15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), catalyzes the 
oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 



strong similarity to C.elegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus : unknown 

Insert length: 1956 bp 

Poly A stretch at pos. 1929, polyadenylation signal at pos. 1903 



1 CGAAGGCGGC GGCGAAGGCC CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
51 AGCCATGGCG GAGTCTGTGG AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
101 AGCGGGAACT TGCCCAGGAG AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
151 GGAGGGGGCG GCCGGGTCCG CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
201 TTCGAATCCC TACAGCCGCT TGATGGCATT GAAACGAATG GGAATTGTAA 
251 GCGACTATGA GAAAATCCGT ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
301 GGAGTAGGTA GTGTGACTGC TGAAATGCTG ACAAGATGTG GCATTGGTAA 
351 GTTGCTACTC TTTGATTATG ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
401 TTTTCTTCCA ACCTCATCAA GCAGGATTAA GTAAAGTTCA AGCAGCAGAA 
451 CATACTCTGA GGAACATTAA TCCTGATGTT CTTTTTGAAG TACACAACTA 
501 TAATATAACC ACAGTGGAAA ACTTTCAACA TTTCATGGAT AGAATAAGTA 
551 ATGGTGGGTT AGAAGAAGGA AAACCTGTTG ATCTAGTTCT TAGCTGTGTG 
601 GACAATTTTG AAGCTCGAAT GACAATAAAT ACAGCTTGTA ATGAACTTGG 
651 ACAAACATGG ATGGAATCTG GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
701 TACAGCTTAT AATTCCTGGA GAATCTGCTT GTTTTGCGTG TGCTCCACCA 
751 CTTGTAGTTG CTGCAAATAT TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
801 TTGTGCAGCC AGTCTTCCTA CCACTATGGG TGTGGTTGCT GGGATCTTAG 
851 TACAAAACGT GTTAAAGTTT CTGTTAAATT TTGGTACTGT TAGTTTTTAC 
901 CTTGGATACA ATGCAATGCA GGATTTTTTT CCTACTATGT CCATGAAGCC 
951 AAATCCTCAG TGTGATGACA GAAATTGCAG GAAGCAGCAG GAGGAATATA 
1001 AGAAAAAGGT AGCAGCACTG CCTAAACAAG AGGTTATACA AGAAGAGGAA 
1051 GAGATAATCC ATGAAGATAA TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
1101 TTCAGAAGAG GAACTGAAAA ATTTTTCAGG TCCAGTTCCA GACTTACCTG 
1151 AAGGAATTAC AGTGGCATAC ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
1201 ACTGAGTTAA CAGTGGAAGA TTCTGGTGAA AGCTTGGAAG ACCTCATGGC 
1251 CAAAATGAAG AATATGTAGA TAATGGACTG GGATATATTG TATTTCTCAT 
1301 GTTAAAGCCT CTTCCCTTGA AATTAAAAAA AAATTTTAAC TGATAAAACT 
1351 TAGGGCAACA TTAATTAATG TATATTCTTA CCTGAATTGT TATACTTTTT 
1401 GAAAATCCTG TGACTTGCCT GTTTCTCCCC GCTCCAACGA AATCATTAAC 
1451 TCTCCTAAAA TGTGTTTCAT TCTAGTAAGA AAACCTCAAA GGATATTGTA 
1501 GGATATAAAT CTTACTTGAA AACATAGCTG TTGAAATGTT TTGGCCTTTT 
1551 GGAGTGGGGG AAGGACAAAT CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
1601 TCCCTTGTGT CTGTTGCATG AGGACATGGA CAATAAAGTA GTATATGATC 
1651 CTCAGATACA GGGAGAAGGA CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
1701 CAAGCATCTG CTCATTATGT TTGGAATTGC TTTCTATAAG AAAATTGCCC 
1751 ACTACTACTA ACTTGATCAA CAATGAATTC AAAATAGTTA ACCTATGAAA 
1801 TAACATCCTC TCAAATGTTT GCTGATGAAG TACAAGTTGA AATGTAGTTA 
1851 TTGGAAAAGT CTGTAACCTG TGGATCATAT ATATTCAAAG TGAGACAAAG 
1901 GCAAATAAAA AGCAGCTATT TTCATGAATA GACAAAAAAA AAAAAAAAAA 
1951 AAAAAG 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein 
Classification: Metabolism 

Prosite motifs: 0 2 HYDROXYACI D_DH_1 (76-105) 



1 MAESVERLQQ RVQELERELA QERSLQVPRS GDGGGGRVRI EKMSSEWDS 
51 NPYSRLMALK RMGIVSDYEK IRTFAVAIVG VGGVGSVTAE MLTRCGIGKL 
101 LLFDYDKVEL ANMNRLFFQP HQAGLSKVQA AEHTLRNINP DVLFEVHNYN 
151 ITTVENFQHF MDRISNGGLE EGKPVDLVLS CVDNFEARMT INTACNELGQ 
201 TWMESGVSEN AVSGHIQLII PGESACFACA PPLVVAANID EKTLKREGVC 
251 AASLPTTMGV VAGILVQNVL KFLLNFGTVS FYLGYNAMQD FFPTMSMKPN 
301 PQCDDRNCRK QQEEYKKKVA ALPKQEVIQE EEEIIHEDNE WGIELVSEVS 
351 EEELKNFSGP VPDLPEGITV AYTIPKKQED SVTELTVEDS GESLEDLMAK 
401 MKNM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c24, frame 1 

TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid 
T03F1., N = 1, Score = 1204, P = 1.9e-122 

TREMBL : ATAC98_3 gene: "YUP8H12 . 3"; Arabidopsis thaliana chromosome 1 
YAC VUP8H12 complete sequence., N - 1, Score - 733, P « 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) homolog - Archaeoglobus 
fulgidus, N = 1, Score = 218, P - 1.8e-17 

TREMBL:AF022796_4 gene: "moeB"; product: "MoeB" ; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluster, complete 
sequence., N - 1, Score = 220, P - 3.7e-16 



>TR£MBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1. 
Length = 419 

HSPs: 

Score = 1204 (180.6 bits), Expect = 1.9e-122, P = 1.9e-122 
Identities = 241/367 (65%), Positives =» 293/367 (79%) 

Query: 37 RVRIEKMSSEVVDSNPYSRLMALKRMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCG 96 

R +IEK+S+EVVDSNPYSRLMAL+RMGIV++YE+IR VA+VGVGGVGSV AEMLTRCG 
Sbjct: 48 RQKIEKLSAEVVDSNPYSRLMALQRMGIVNEYERIREKTVAVVGVGGVGSVVAEMLTRCG 107 

Query: 97 IGKLLLFDYDKVELANMNRLFFQPHQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVEN 156 

IGKL+LFDYDKVE+ANMNRLF+QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 
SbjCt: 108 IGKLILFDYDKVEIANMNRLFYQPNQAGLSKVEAARDTLIHVNPDVQIEVHNFNITTMDN 167 

Query: 157 FQHFMDRISNGGLEEGKPVDLVLSCVDNFEARMTINTACNELGQTWMESGVSENAVSGHI 216 

F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 
Sbjct: 168 FDTFVNRIRKGSLTDGK- 1 DLVLSCVDNFEARMAVNMACNEENQI WMESGVSENAVSGHI 226 

Query: 217 QLIIPGESACFACAPPLVVAANIDEKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNF 27 6 

Q I PG++ACFAC PPLVVA+ IDE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 
Sbjct: 227 QYIEPGKTACFACVPPLVVASGIDERTLKRDGVCAASLPTTMAWAGFLVMNTLKYLLNF 286 

Query: 277 GTVSFYLGYNAMQDFFPTMSMKPNPQCDDRNCRKQQEEYKKKVAALPKQ-EV-IQEEEEI 334 

G VS Y+GYNA+ DFFP S+KPNP CDD +C ++Q+EY++KVA P EV + EEE + 
Sbjct: 287 GEVSQYVGYNALSDFFPRDSIKPNPYCDDSHCLQRQKEYEEKVANQPVDLEVEVPEEETV 346 

Query: 335 IHEDNEWGIELVSEVSEEELKNFSGPVPDLPEGITVAYTIPKKQEDSVTELTVEDSGESL 394 

+HEDNEWGIELV+E SE + S + G+ AY P K+ D+ TEL+ + + 
Sbjct: 347 VHEDNEWGIELVNE-SEPSAEQSSSL--NAGTGLKFAYE-PIKR-DAQTELSPAQA--AT 399 

Query: 395 EDLMAKMKN 403 
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D M +K+ 
Sbjct: 400 HDFMKSIKD 408 



Pedant information for DKFZphtes3_15c24, frame 1 



Report for DKFZphtes3_15c24 . 1 



[LENGTH] 
[MW] 

[pi] 
[HOMOLJ 

[FUNCATJ 

I FUNCAT] 

palmitylatic 

4e-07 

[ FUN CAT ] 

cerevisiae, 

[FUNCAT] 

4e-07 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

2e-06 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

[KWj 

[KW] 



404 

44863.36 
4.79 

TREMBL : CEUT03F1 11 gene: 



"T03F1.1"; Caenorhabditis elegans cosmid T03F1. le-115 



h cofactor metabolism [H. influenzae, HI1449] 2e-08 

06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) [S. cerevisiae, YDR390c UBA2 - El-like] 



04.05.05 mrna processing (5* -end, 
YDR390c UBA2 - El-like] 4e-07 

06.13.01 cytoplasmic degradation 



3' -end processing and mrna degradation) [S. 
[S. cerevisiae, YDR390c UBA2 - El-like] 



30.10 nuclear organization [S. cerevisiae, YDR390c UBA2 - El-like] 4e-07 

11.01 stress response [S. cerevisiae, YKL210w UBA1 - El-like) 2e-06 

30.03 organization of cytoplasm [S. cerevisiae, YKL210w UBA1 - El-like] 

BL01042A Homoserine dehydrogenase proteins 

thiamine pyrophosphate le-07 

molybdenum 5e-07 

molybdopterin biosynthesis 5e-07 

molybdopterin biosynthesis protein moeB 2e-12 

D_2_HYDROXYACID_DH_l 1 

TRANSMEMBRANE 1 

LOW COMPLEXITY 8.66 % 



SEQ MAESVERLQQRVQELERELAQERSLQVPRSGDGGGGRVRIEKMSSEWDSNPYSRLMALK 

SEG 

PRO ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 

MEM 

SEQ RMG IVSDYEKIRT FA V A I VGVGGVGS VT AEMLT RC G I GKLLL FD Y DK VE L ANMN RLF FQ P 

SEG xxxxxxxxx 

PRD cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ HQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

SEG 

PRD ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 

MEM 

SEQ CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLIIPGESACFACAPPLVVAANID 

SEG 

PRD cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 

MEM 

SEQ EKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 

MEM 



SEQ PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

SEG xxxxxxxxxxxxxxx. . .xxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 

MEM 

SEQ VPDLPEGITVAYTIPKKQEDSVTELTVEDSGESLEDLMAKMKNM 

SEG 

PRD ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 

MEM 



Prosite for DKF2phtes3_15c24 . 1 
PS0O065 76->105 D_2_HYDROXYACID_DH_l PDOC00063 



(No Pfam data available for DKF2phtes3_15c24 . 1) 



583 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_15c6 



group: transmembrane protein 

DKFZphtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1283 bp 

Poly A stretch at pos. 1264, no polyadenylation signal found 



1 GAGACACTGA GCCCCGAGAC AGTGAGTGGT GGCCTCACTG CTCTGCCCGG 

51 CACCCTGTCA CCTCCACTTT GCCTTGTTGG AAGTGACCCA GCCCCCTCCC 

101 CTTCCATTCT CCCACCTGTT CCCCAGGACT CACCCCAGCC CCTGCCTGCC 

151 CCTGAGGAAG AAGAGGCACT CACCACTGAG GACTTTGAGT TGCTGGATCA 

201 GGGGGAGCTG GAGCAGCTGA ATGCAGAGCT GGGCTTGGAG CCAGAGACAC 

251 CGCCAAAACC CCCTGATGCT CCACCCCTGG GGCCCGACAT CCATTCTCTG 

301 GTACAGTCAG ACCAAGAAGC TCAGGCCGTG GCAGAGCCAT GAGCCAGCCG 

351 TTGAGGAAGG AGCTGCAGGC ACAGTAGGGC TTCCTGGCTA GGAGTGTTGC 

401 TGTTTCCTCC TTTGCCTACC ACTCTGGGGT GGGGCAGTGT GTGGGGAAGC 

451 TGGCTGTCGG ATGGTAGCTA TTCCACCCTC TGCCTGCCTG CCTGCCTGCT 

501 GTCCTGGGCA TGGTGCAGTA CCTGTGCCTA GGATTGGTTT TAAATTTGTA 

551 AATAATTTTC CATTTGGGTT AGTGGATGTG AACAGGGCTA GGGAAGTCCT 

601 TCCCACAGCC TGCGCTTGCC TCCCTGCCTC ATCTCTATTC TCATTCCACT 

651 ATGCCCCAAG CCCTGGTGGT CTGGCCCTTT CTTTTTCCTC CTATCCTCAG 

701 GGACCTGTGC TGCTCTGCCC TCATGTCCCA CTTGGTTGTT TAGTTGAGGC 

751 ACTTTATAAT TTTTCTCTTG TCTTGTGTTC CTTTCTGCTT TATTTCCCTG 

801 CTGTGTCCTG TCCTTAGCAG CTCAACCCCA TCCTTTGCCA GCTCCTCCTA 

851 TCCCGTGGGC ACTGGCCAAG CTTTAGGGAG GCTCCTGGTC TGGGAAGTAA 

901 AGAGTAAACC TGGGGCAGTG GGTCAGGCCA GTAGTTACAC TCTTAGGTCA 

951 CTGTAGTCTG TGTAACCTTC ACTGCATCCT TGCCCCATTC AGCCCGGCCT 

1001 TTCATGATGC AGGAGAGCAG GGATCCCGCA GTACATGGCG CCAGCACTGG 

1051 AGTTGGTGAG CATGTGCTCT CTCTTGAGAT TAGGAGCTTC CTTACTGCTC 

1101 CTCTGGGTGA TCCAAGTGTA GTGGGACCCC CTACTAGGGT CAGGAAGTGG 

1151 ACACTAACAT CTGTGCAGGT GTTGACTTGA AAAATAAAGT GTTGATTGGC 

1201 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAGGGCGGCC GCTCTAGAGG 

1251 ATCCAAGCTT ACGTAAAAAA AAAAAAAAAA AAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 



1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 
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BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 

PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana, N * 1 # Score 
76, P = 0.33 



>PIR:S54250 ribosomal protein L2 
Length = 258 

HSPs: 



Arabidopsis thaliana 



Score ■= 76 (11.4 bits), Expect - 4.0e-01, P - 3.3e-01 
Identities = 30/91 (32%), Positives = 44/91 (48%) 

Query: 15 PGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLFSFHYAPSPGGLALS 74 

PG GA P+ R+ F+ PF + +E+ A C P SSL+ A G L 

SbjCt: 52 PGRGA- PL ARVTFRH PFRF KKQKELFVAAEVCTPVSSLYCGKKATLWGNVLP 103 

Query: 75 FSSYPQGPVLLCP HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A ++++V+ 
SbjCt: 104 LRSIPEGAVV-CNVEHHVGDRGVLARASGDYAIVI 137 



Pedant information for DKFZphtes3_15c6, frame 2 



Report for DKFZphtes3_15c6.2 



[LENGTH) 

[MW] 

[pi] 

[PROSITEJ 
[PROSITE] 
[PROSITE] 
[KW] 



118 

12413.79 
7.53 

LEUCINE_ZIPPER 1 
MYRISTYL 1 
AS N_GLYCOSYLAT I ON 
TRANSMEMBRANE 1 



SEQ 
PRD 
MEM 



MVAIPPSACLPACCPGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLF 
cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 



SEQ SFHYAPSPGGLALSFSSYPQGPVLLCPHVPLGCLVEALYNFSLVLCSFLLYFPAVSCP 
PRD eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 
MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphtes3_15c6.2 

PS00001 100->104 ASN_GLYCOSYLATION PDOC00001 

PS00008 70->76 MYRISTYL PDOC00008 

PS00029 84->106 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_15c6.2) 
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DKFZphtes3_15gl4 



group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR24 3c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 34 62, no polyadenylation signal found 

1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 
51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 

101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 

151 TGATGAACAG GGACAGTTAG TTAATAAGAC CATC GAT GAG CCTATTTTCA 

201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 

251 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 

301 TACT TT GAT T AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 

351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 

401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 

4 51 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 

501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 

551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 

601 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 

651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 

701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 

751 AG AC C AC AG A AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 

801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 

851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 

901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 

951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 
1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 
1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 
1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 
1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 
1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 
1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 
1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATTTGGGA AGGGAAGGAA 
1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 
1401 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 
1451 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 
1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 
1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 
1601 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 
1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 
1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 
1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 
1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 
1851 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 
1901 CTACAGACAT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 
1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 
2001 TGGAAGATCA TGACATTGAT GTCAAAACGA AAGGTTCCCA CATTGATGAA 
2051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 
2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 
2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 
2201 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT G AT AG AT AC C 
2251 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 
2301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 
2351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 
2401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 
2451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AGACAATTAC 
2501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 
2551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 
2601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 
2651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 
2701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 
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2751 AAAGCAGTAT GGAGCATTAT ATATCAGTAA TGTGATATAT ATACTTAAGC 

2801 CAGTTTAACC ATTTTGGGAA ATGTTAGCAT TAGGAAATAA AATCCAAAAG 

2851 AAGGAAGAGA AGCTATATGC AATGCAAAAT TTGCTTATTG CAATATTTTC 

2901 ATATACAGAC ACTAAAAACA GTTTTCAAAG TCCAGCATTA CGTAACTAAA 

2951 GTAAGTAAAA TGATGTGTAT CAACTTGATG GTAAAATATG TAGTTATTTA 

3001 AAAAAGCAAT GAACAATTTA GTTTCATGAG AAAATGTTGC CCCCTAAAAG 

3051 TAGAACACAT ATGTTACAAC TGCAATAATA CTCTGAATTC ATCTTTCACA 

3101 AATAAGAGAC ATGTTAGCAT AGTGATTAAA AGCACAGATA TTGGAGACAA 

3151 ACTAACCCAG TTTGAACCCT GGCACTGCCA CGTATAGCAC TGCAGCCTTG 

3201 GGAAAGTTAT TTAAACTCAT GGGCTTCAGT TTCAACATCT GTAAAATGGG 

3251 CATGTTAACA TTGCCTACCT CATAGGATTA CTGTGAGAAT TTTCTAAGTT 

3301 AATATATGTA AAGCAACTTT AAAAAGTGCC TGGCACTTAG TTATTGTTAA 

3351 GTAAGTGTCT GCAGATGCAA GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 

3401 TCCCTTCCTG TTAAGATGAA AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 

3451 CGGCCGCTCA AGATGAAAAA AAAAAAAAAA AAAAAAAAAA AAAGG 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 



1 MEEDTDYRIR FSSLCFFNDH VGFHGTIKSS PSDFIVIEID EQGQLVNKTI 
51 DEPIFKISEI QLEPNNFPKK PKLDLQNLSL EDGRNQEVHT LIKYTDGDQN 
101 HQSGSEKEDT IVDGTSKCEE KADVLSSFLD EKTHELLNNF ACDVREKWLS 
151 KTELIGLPPE FSIGRILDKN QRASLHSAIR QKFPFLVTVG KNSEIVVKPN 
201 LEYKELCHLV SEEEAFDFFK YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 
251 KKFGNLVETK SFSKMNCSAG NPNVVVTVRF REKAHKRGKR PLSECQEGKV 
301 IYTAFTLRKE NLEMFEAIGF LAIKLGVIPS DFSYAGLKDK KAITYQAMVV 
351 RKVTPERLKN IEKEIEKKRM NVFNIRSVDD SLRLGQLKGN HFDIVIRNLK 
401 KQINDSANLR ERIMEAIENV KKKGFVNYYG PQRFGKGRKV HTDQIGLALL 
451 KNEMMKAIKL FLTPEDLDDP VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 
501 LLEALHRFGM TEEGCIQAWF SLPHSMRIFY VHAYTSKIWN EAVSYRLETY 
551 GARVVQGDLV CLDEDIDDEN FPNSKIHLVT EEEGSANMYA IHQVVLPVLG 
601 YNIQYPKNKV GQWYHDILSR DGLQTCRFKV PTLKLNIPGC YRQILKHPCN 
651 LSYQLMEDHD IDVKTKGSHI DETALSLLIS FDLDASCYAT VCLKEIMKHD 
701 V 

B LAS TP hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_15gl4, frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 . 09"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosmid., N = 3, Score - 511/ P = 2.9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces 
cerevisiae), N = 2, Score - 516, P =* 7.3e-54 

SWISSPROT:YQ4B_CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.ll IN 
CHROMOSOME v., N = 2, Score =386, P = 2.1e-34 

>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length « 676 

HSPs: 

Score = 516 (77.4 bits), Expect - 7.3e-54, Sum P(2) = 7.3e-54 
Identities « 151/498 (30%), Positives - 245/498 (49%) 

Query: 191 KNSEIVVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 249 
+ E V P L +L + EE+ Y A K + F+ +K R +H + 
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Sbjct : 


109 


RRQEFNVDPELR-NQLVEIFGEEDVLKIESVYRTANKMETAKNFE DKSVRTKIHQLL 


164 


Query: 


250 


NKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHK-RGKRPLSECQEG-KVIYTAFTL 


307 






+ F N +E+ + N +EK ++ R + G + FTL 




Sbjct: 


165 


REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 


224 


Query: 


308 


RKENLEMFEAIGFLAIKLGVIPSD-FSYAGLKDKKAITYQAMVVRKVTPERLKNIEKEIE 


366 






KEN + EA+ + KL +PS YAG KD++A+T Q + + K+ +RL + + + 




Sbjct : 


225 


HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 


282 


Query: 


367 


KKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENVKKKGFV 


426 






K M + N D SL LG LKGN F +VIR++ N +L E + +++ + GF+ 




Sbjct : 


283 


-KGMIIGNYNFSDASLNLGDLKGNEFWVIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 


340 


Query: 


427 


NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNR-AKKYFLQTEDAK 


485 






NY+G QRFG + T IG LL + KA +L L+ +D P ++ A+K + +T+DA 




Sbjct : 


341 


NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 


399 


Query: 


486 


GTLSLMPEFKVRERALLEALHRFGMTEEGCIQ — AWFS LPHSMRI FYVHAYTSKI W 


539 






L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 




Sbjct : 


400 


LALKQMPRQCLAENALLYSLSNQRKEEDGTYSENAYYTAIMKIPRNLRTMYVHAYQSYVW 


459 


Query: 


540 


NEAVSYRLETYGARVVQGDLVC LDEDIDDENFPNS KIHLVTEEEGS 


585 






N S R+E +G ++V GDLV L IDDE+F + VT+E+ 




Sbjct : 


4 60 


NSIASKRIELHGLKLVVGDLVIDTSEKSPLISGIDDEDFDEDVREAQFIRAKAVTQEDID 


519 


Query: 


586 


ANMYAIHQVVLPVLGYNIQYPKNK-VGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQI 


644 






+ Y + VVLP G+++ YP N+ + Q Y DIL D + + ++ G YR + 




Sbjct: 


520 


SVKYTMEDVVLPSPGFDVLYPSNEELKQLYVDILKADNMDPFNMRRKVRDFSLAGSYRTV 


579 


Query: 


645 


LKHPCNLSYQLMEDHDIDVKTKGSHID 671 








++ P +L Y+++ D + + +D 




Sbjct: 


580 


IQKPKSLEYRI IH YDDPSQQLVNTDLD 606 




Score 


- B6 


(12.9 bits), Expect = 3.2e-01, Sum P(2) = 2.8e-01 




Identities = 40/160 (25%), Positives = 77/160 (48%) 




Query: 


22 


GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPKLDLQNLSLE 


81 






GF G IK +DF+V EID++G++++ T D+ FK+ + +P K +++ + S E 




Sbjct: 


55 


GFRGQI KQRYTDFLVNEI DQEGKVIHLT-DKG- FKMPK KPQR— SKEEVNAEKES-E 


106 


Query: 


82 


DGRNQEVHTLIKYTDGDQNHQSGS— EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 


138 






R QE + D + +Q +ED + ++ + K + +F D+ ++ 




Sbjct: 


107 


AARRQEFNV DPELRNQLVEIFGEEDVLKIESVYRTANKMETAKNFEDKSVRTKIH 


161 


Query: 


139 


NFACDVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSAIRQ 181 








+RE + ++ E + FIR ++N R + I Q 




Sbjct: 


162 


QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 




Score 


- 58 


(8.7 bits), Expect » 7.3e-54, Sum P(2) - 7.3e-54 




Identities = 10/23 (43%), Positives = 17/23 (73%) 




Query: 


676 


SLLI S FDLDASC YATVCLKEIMK 698 








++++ F L S YAT+ L+E+MK 




Sbjct: 


638 


AWLKFQLGTSAYATMALRELMK 660 





Pedant information for DKFZphtes3_15gl4, frame 2 



Report for DKFZphtes3_15gl4 .2 



[ LENGTH ] 

[MW] 

[pi] 

[HOMOL] 

51 

[FUNCAT] 

[BLOCKS) 

[BLOCKS] 

[BLOCKS] 

[SUPFAMJ 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

IKW] 



701 

80700.96 
7.31 

PIR:S67136 hypothetical protein YOR243c 



yeast (Saccharomyces cerevisiae) 2e- 



99 unclassified proteins 
BL01268C 
BL01268B 
BL01268A 

hypothetical protein HI0701 3e-06 

MYRISTYL 7 

AMI DAT I ON 2 

CAMP PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 16 

TYR_PHOSPHO SITE 1 

PKC_PHOSPHO~SITE 13 

AS N_GL YCOS Y LAT I ON 5 

Alpha_Beta 



[S. cerevisiae, YOR243c] 8e-53 



588 



WO 01/12659 



PCT/IB00/01496 



SEQ MEEDTDYRIRFSSLCFFNDHVGFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEI 

PRD ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKLDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFSIGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhheeecccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIWKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhccc.ccceeeecccccch 

SEQ HRKAVHHFVNKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ iyTAFTLRKENLEMFEAIGFLAIKLGVIPSDFSYAGLKDKKAITYQAMVVRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ IEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhheeeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEDAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRI FYVHAYTSKI WN 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARVVQGDLVCLDEDIDDENFPNSKIHLVTEEEGSANMYAIHQWLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeecccccccccccceeeccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ I DVKTKGSH I DET ALSLLI S FDLDASC YAT VCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_15gl4 .2 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00O04 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0OO6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 



103->107 
105->109 
110->114 
116->120 
127->131 
150->154 
211->215 
237->241 
377->381 
463->467 
580->584 
668->672 
537->546 



105->108 
115->118 
232->235 
237->240 
277->280 
306->309 
381->384 
525->528 
535->538 
544->547 
625->628 
632->635 



25->31 
43->49 
114->120 



266->270 
404->408 
650->654 
351->355 



30->34 
49->53 
79->83 
95->99 



26->29 



47->51 
77->81 



MYRISTYL 
MYRISTYL 



ASN_GLYCOSYLATION 

A SN_G L YCOS Y LAT ION 

ASN_GLYCOSYLATION 

ASN_GL YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

P KC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2"PHOSPHO~SITE 

CK2~PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH03SITE 

TYR_PHOSPHO SITE 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC000O4 
PDOC000O5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0O005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0O006 
PDOC0O006 
PDOC0OOO6 
PDOC00006 
PDOC00006 
PDOC000O6 
PDOC000O7 
PDOC0O0O8 
PDOC000O8 
PDOC00OO8 
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PCTYIBOO/01496 



PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



326->332 
3B5->391 
514->520 
622->628 
287->291 
436->440 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 
AMI DAT I ON 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15gl4 .2) 
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DKFZphtes3_15hl 



group: testes derived 

DKF2phtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2277 bp 

Poly A stretch at pos. 2252, polyadenylation signal at pos. 2226 

1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 
51 TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 

101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 

151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 

201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 

251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 

301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 

351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 

401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 

451 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 

501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 

551 TCAGCCCATG AAACACCTCT TACACCCCAC CAAGGGAGAG CCCAAGTGGA 

601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 

651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 

701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 

751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 

801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 

851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 

901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 

951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 
1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 
1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 
1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 
1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 
1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 
1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 
1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 
1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 
1401 CAGGCACAAG TGAAGCTGAG AGACTTGGAG TCAGCCGTGA ACAATTTTGA 
1451 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 
1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 
1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 
1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 
1651 TGAGAGATGA GCCCGAGAAG GTGGTGAAGC AGTGGGACCA TAGTGAGGAT 
1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 
1751 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 
1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 
1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 
1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 
1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 
2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 
2051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 
2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 
2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 
2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 
2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 69 bp to 2084 bp; peptide length: 672 
Category: similarity to known protein 



1 MSDPEGETLR STFPSYMAEG ERLYLCGEFS KAAQSFSNAL YLQDGDKNCL 
51 VARSKCFLKM GDLERSLKDA EASLQSDPAF CKGILQKAET LYTMGDFEFA 
101 LVFYHRGYKL RPDREFRVGI QKAQEAINNS VGSPSSIKLE NKGDLSFLSK 
151 QAENIKAQQK PQPMKHLLHP TKGEPKWKAS LKSEKTVRQL LGELYVDKEY 
201 LEKLLLDEDL IKGTMKGGLT VEDLIMTGIN YLDTHSNFWR QQKPIYARER 
251 DRKLMQEKWL RDHKRRPSQT AHYILKSLED IDMLLTSGSA EGSLQKAEKV 
301 LKKVLEWNKE EVPNKDELVG NLYSCIGNAQ IELGQMEAAL QSHRKDLEIA 
351 KEYDLPDAKS RALDNIGRVF ARVGKFQQAI DTWEEKIPLA KTTLEKTWLF 
401 HEIGRCYLEL DQAWQAQNYG EKSQQCAEEE GDIEWQLNAS VLVAQAQVKL 
451 RDFESAVNNF EKALERAKLV HNNEAQQAII SALDDANKGI IRELRKTNYV 
501 ENLKEKSEGE ASLYEDRI IT REKDMRRVRD EPEKVVKQWD HSEOEKETDE 
551 DDEAFGEALQ SPASGKQSVE AGKARSDLGA VAKGLSGELG TRSGETGRKL 
601 LEAGRRESRE IYRRPSGELE QRLSGEFSRQ EPEELKKLSE VGRREPEELG 
651 KTQFGEIGET KKTGNEMEKE YE 

BLASTP hits 
Entry AF039202_1 from database TREMBL: 

product: "Hsp70/Hsp90 organizing protein"; Cricetulus griseus 

Hsp70/Hsp90 organizing protein mRNA, complete cds. 

Score = 149, P = 5.3e-07, identities ■ 42/160, positives « 74/160 

Entry AI09782_1 from database TREMBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds . 

Score = 155, P = 6.1e-07, identities = 140/623, positives = 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score * 156, P =» 9.7e-08, identities = 41/153, positives « 72/153 



Alert BLASTP hits for DKFZphtes3_15hl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15hl, frame 3 



Report for DKFZphtes3_15hl . 3 



[LENGTH] 672 

[MW] 76655.61 

[pi] 5.49 

[HOMOLJ PIR:S56658 stress-induced protein stil - 

[SUPFAM] tetratricopeptide repeat homology le-07 

[PROSITE] MYRISTYL 7 

[PROSITE] AMIDATION 3 

[PROSITE] CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] TYR PHOSPHO SITE 1 

[PROSITE] PKCTPH0SPH0~SITE 11 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4.76 % 



soybean 6e-10 



SEQ MSDPEGETLRSTFPSYMAEGERLYLCGEFSKAAQSFSNALYLQDGDKNCLVARSKCFLKM 

SEG 

PRO cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 

SEQ GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 

SEG 

PRD hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



QKAQEAINNSVGSPSSIKLENKGDLSFLSKQAENIKAQQKPQPMKHLLHPTKGEPKWKAS 

hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

LKSEKTVRQLLGELYVDKEYLEKLLLDEDLIKGTMKGGLTVEDLIMTGINYLDTHSNFWR 

xxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

RALDNIGRVFARVGKFQQAIDTWEEKIPLAKTTLEKTWLFHEIGRCYLELDQAWQAQNYG 

hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 

EKSQQCAEEEGDI EWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAI I 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SALDDANKGIIRELRKTNYVENLKEKSEGEASLYEDRIITREKDMRRVRDEPEKWKQWD 

x 

hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



HSEDEKETDEDDEAFGEALQSPASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

xxxxxxxxxxxxx 

ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

LEAGRRESREIYRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 

KKTGNEMEKEYE 

cccccccccccc 



Prosite for DKFZphtes3_15hl . 3 



PS00001 
PS00001 
PS00004 
PSOO004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS0O005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSOO006 
PS00006 
PS00006 
PS00006 
PS0O0O6 
PS00006 
PS0O006 
PS0O006 
PS00007 
PS00008 
PS0O0O8 
PS000O8 



128->132 
438->442 
265->269 
605->609 
613->617 
636->640 
8->ll 
66->69 
136~>139 
180->183 
183->186 
186->189 
214->217 
342->345 
564->567 
596->599 
660->663 
2->6 
66->70 
93->97 
171->175 
220->224 
277->281 
382->386 
392->396 
481->485 
507->511 
512->516 
542->546 
548->552 
628->632 
663->667 
506->515 
119->125 
132->138 
213->219 



AS N_G L YC OS YL AT I ON 

ASNGLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

C AMP_PHOS PHO_S ITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

P KC_PHOS PHO_S ITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2"PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

C K2_PHOS PHO__S ITE 

CK2 PHOSPHO_SITE 

C K2^PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC0O0O1 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC000O8 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 



288->294 
320->326 
334->340 
590->596 
596->600 
603->607 
641->645 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 
AMI DAT ION 
AMI DAT I ON 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15hl . 3) 
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DKFZphtes3_15i5 



group: cell structure and motility 

DKFZphtes3_15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins. ~ 

The novel protein is similar to the Chlamydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63. This protein is important for the maintenance of a planar form of sperm flagellar 
beating. In addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfaro or SCOP motif e. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 



strong similarity to "radial spokehead" proteins 

complete cDNA, complete cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus: unknown 

Insert length: 2478 bp 

Poly A stretch at pos. 2452, polyadenylation signal at pos. 2433 



1 CACCCTGGCC CGCTCCCCGC GCCCTCCACG GGTAACGGCC CCCTCTCTCG 
51 GTGCTCAGAA ACCGGCGGTG TCGACAGGTG GCTCTCGCTT GGCCTCCTTG 
101 TCTGCAAGCC TTTCTCCTAG AGATCTGTGC CTCCTGGCGA ACCATGGGAG 
151 ACCTGCCGCC CTACCCTGAG CGCCCTGCCC AGCAGCCTCC GGGCCGGAGG 
201 ACTTCTCAGG CCTCCCAGAG GCGGCACAGT CGGGACCAAG CTCAGGCCCT 
251 GGCAGCGGAC CCCGAGGAGA GGCAGCAGAT ACCTCCAGAC GCCCAGCGAA 
301 ACGCCCCTGG TTGGTCACAG AGGGGCAGCC TGTCCCAACA GGAGAACTTG 
351 CTGATGCCCC AGGTCTTCCA GGCTGAGGAA GCCCGGCTGG GTGGCATGGA 
401 GTACCCATCT GTGAACACGG GCTTTCCCTC AGAGTTCCAG CCTCAGCCTT 
451 ACTCTGATGA AAGCAGGATG CAGGTCGCCG AGCTCACCAC CAGCCTAATG 
501 CTGCAGCGGC TCCAGCAGGG CCAAAGCAGC CTGTTCCAGC AACTGGACCC 
551 CACCTTCCAG GAGCCCCCAG TCAACCCCTT GGGCCAGTTC AACCTCTACC 
601 AGACAGACCA GTTCTCTGAA GGTGCCCAGC ACGGGCCTTA CATAAGGGAT 
651 GACCCTGCCC TTCAGTTCTT GCCCTCTGAG CTGGGCTTCC CACACTACAG 
701 TGCCCAGGTG CCTGAGCCCG AGCCTCTGGA GCTGGCCGTG CAGAACGCCA 
751 AGGCCTACCT GCTGCAGACC AGCATCAATT GCGACCTCAG .CCTGTACGAG 
801 CACCTGGTAA ATCTGCTGAC CAAGATCCTG AACCAGCGGC CTGAGGACCC 
851 CTTGTCTGTC CTGGAGTCTC TGAACCGCAC CACGCAGTGG GAGTGGTTCC 
901 ACCCCAAGCT GGACACGCTG CGGGACGACC CCGAGATGCA GCCCACCTAC 
951 AAGATGGCGG AGAAACAGAA GGCGCTGTTC ACCCGGAGTG GAGGCGGCAC 
1001 TGAAGGCGAA CAGGAGATGG AGGAGGAGGT GGGGGAGACA CCAGTGCCCA 
1051 ACATCATGGA GACTGCCTTC TACTTCGAGC AGGCCGGCGT CGGCCTGAGC 
1101 TCGGACGAGA GCTTCCGCAT TTTCCTGGCC ATGAAACAGC TGGTGGAGCA 
1151 GCAGCCCATC CACACCTGTC GCTTCTGGGG CAAGATCCTG GGAATCAAAC 
1201 GCAGCTACCT GGTGGCCGAG GTGGAATTCC GGGAGGGCGA GGAGGAGGCA 
1251 GAGGAGGAGG AGGTGGAGGA GATGACGGAA GGTGGCGAGG TCATGGAGGC 
1301 GCACGGCGAG GAGGAGGGCG AGGAGGACGA GGAGAAGGCC GTGGACATCG 
1351 TCCCTAAGTC CGTATGGAAG CCGCCGCCCG TGATCCCCAA GGAGGAGAGC 
1401 CGCTCAGGCG CCAACAAGTA CCTGTACTTT GTGTGCAACG AGCCGGGCCT 
1451 GCCATGGACG CGGCTGCCCC ACGTCACTCC AGCCCAGATC GTGAACGCCC 
1501 GAAAGATCAA GAAGTTCTTC ACAGGCTACC TGGACACGCC AGTCGTCAGC 
1551 TACCCACCCT TCCCGGGCAA CGAGGCCAAC TACCTGCGGG CCCAGATAGC 
1601 CCGCATCTCG GCCGCCACGC AGGTCAGCCC GCTGGGCTTC TACCAGTTTA 
1651 GTGAGGAGGA GGGCGACGAG GAGGAGGAAG GTGGTGCTGG GCGCGACTCC 
1701 TACGAGGAGA ACCCGGACTT CGAGGGCATC CCCGTGCTGG AGCTGGTCGA 
1751 CTCCATGGCC AACTGGGTGC ATCACACACA GCACATCCTG CCGCAGGGCC 
1801 GCTGCACTTG GGTGAACCCT TTGCAGAAGA CAGAGGAGGA GGAGGACCTG 
1851 GGGGAGGAGG AAGAGAAGGC AGATGAGGGG CCAGAGGAGG TGGAGCAGGA 
1901 GGTTGGCCCC CCACTGCTAA CGCCACTTTC AGAAGATGCA GAAATCATGC 
1951 ACCTGGCACC CTGGACCACC CGCCTGTCCT GCAGCCTCTG CCCGCAGTAC 
2001 TCAGTGGCCG TTGTGCGCTC CAACCTCTGG CCCGGGGCCT ATGCCTATGC 
2051 CAGTGGCAAA AAGTTTGAGA ACATCTACAT CGGCTGGGGT CACAAGTACA 
2101 GCCCCGAGAG CTTCAACCCG GCCCTGCCAG CCCCCATTCA ACAAGAGTAC 
2151 CCCAGTGGCC CAGAGATCAT GGAGATGAGT GACCCCACAG TGGAAGAGGA 
2201 GCAGGCTCTG AAAGCAGCCC AGGAACAAGC CCTGGGAGCC ACAGAGGAGG 
2251 AGGAGGAGGG CGAGGAGGAG GAGGAGGGCG AGGAGACAGA TGACTGAGGC 
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2301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 
2351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 
2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 
2451 GCATTAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



86251010: 

Molecular cloning and expression of flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlamydomonas flagellar polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 



ORF from 144 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 



1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR DQAQALAADP EERQQIPPDA 

51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 

101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 

151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 

201 NAKAYLLQTS INCDLSLYEH LVNLLTKILN QRPEDPLSVL ESLNRTTQWE 

251 WFHPKLDTLR DDPEMQPTYK MAEKQKALFT RSGGGTEGEQ EMEEEVGETP 

301 VPNIMETAFY FEQAGVGLSS DESFRIFLAM KQLVEQQPIH TCRFWGKILG 

351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 

401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 

451 NARKIKKFFT GYLDTPVVSY PPFPGNEANY LRAQIARISA ATQVSPLGFY 

501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP VLELVDSMAN WVHHTQHILP 

551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 

601 IMHLAPWTTR LSCSLCPQYS VAVVRSNLWP GAYAYASGKK FENIYIGWGH 

651 KYSPESFNPA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 

701 EEEEEGEEEE EGEETDD 



Entry U73123_l from database TREMBL: 

product: "radial spokehead"; Strongylocentrotus purpuratus radial 
spokehead mRNA, complete cds. 

Score = 1604, P = 7.4e-165, identities = 303/523, positives = 395/523 
Entry B44498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score = 386, P = 3.4e-45, identities - 105/264, positives = 138/264 



Peptide information for frame 3 



BLASTP hits 



Alert BLASTP hits for DKFZphtes3_15i5, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_15i5, frame 3 



Report for DKFZphtes3_15i5.3 



[LENGTH] 

[MW] 

[pi] 



717 

80913.61 
4.36 
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(HOMOLJ 


TREMBL:U73123 1 product: 


"radial spokehead"; Strongylocentrotus purpuratus 


radial spokehead mRNA, complete cds. 


. le- 


130 


[PROSITE] 


TRANSFERRIN 1 1 






[PROSITE] 


MYRISTYL 5 






[PROSITE] 


AM I DAT I ON 2 






[ PROSITE] 


CAMP PHOSPHO_SITE 


2 




[PROSITE] 


CK2_PHOSPHO_SITE 


14 




[PROSITE] 


TYR~PHOSPHO SITE 


1 




[PROSITE] 


GLYCOSAMINOGLYCAN 


1 




[PROSITE] 


PKC PHOSPHO SITE 


8 




[ PROSITE) 


ASN~GLYCOSYLATION 


1 




[KW] 


All Alpha 






[KW] 


LOW COMPLEXITY 21, 


.48 « 





SEQ MGDLPPYPERPAQQ P PG RRTS QAS QRRH S RDQ AQALAAD P EERQQ I PPDAQRNAPGWSQR 

SEG . . . .xxxxxxxxxxxx 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccccccccccccc 

SEQ GSLSQQENLLMPQVFQAEEARLGGMEYPSVNTGFPSEFQPQPYSDESRMQVAELTTSLML 

SEG xxxx 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhhhhhh 

SEQ QRLQQGQSSLFQQLDPTFQEPPVNPLGQFNLYQTDQFSEGAQHGPYIRDDPALQFLPSEL 

SEG xxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GFPHYSAQVPEPEPLELAVQNAKAYLLQTSINCDLSLYEHLVNLLTKILNQRPEDPLSVL 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhccccchhhh 

SEQ ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

SEG xxxxxxxxxxxxxxxx . . 

PRD hhhchhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 

SEQ VPNIMETAFYFEQAGVGLSSDESFRIFLAMKQLVEQQPIHTCRFWGKILGIKRSYLVAEV 

SEG xxx 

PRD ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhhhhhcccchhhhhhhh 

SEQ EFREGEEEAEEEEVEEMTEGGEVMEAHGEEEGEEDEEKAVDI VPKSVWKPPPVI PKEESR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx , 

PRD hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeecccccccccccccccccc 

SEQ SGANKYLYFVCNEPGLPWTRLPHVTPAQIVNARKIKKFFTGYLDTPVVSYPPFPGNEANY 

SEG 

PRD cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhcccccccccccccccchhhh 

SEQ LRAQIARISAATQVSPLGFYQFSEEEGDEEEEGGAGRDSYEENPDFEGIPVLELVDSMAN 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecchhhh 

SEQ WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

SEQ IMHLAPWTTRLSCSLCPQYSVAWRSNLWPGAYAYASGKKFENIYIGWGHKYSPESFNPA 

SEG 

PRD cccccccccccccccccccceeeeeeccccceeeecccccceeeeeeccccccccccccc 

SEQ LPAPIQQEYPSGPEIMEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

SEG xxxxxxxxxxxxxx. . .xxxxxxxxxxxxxx. . . 

PRD cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphtes3_15i5 .3 



PS00001 


244->248 


PS00002 


282->286 


PS00004 


18->22 


PS00004 


26->30 


PS00005 


24->27 


PS00O05 


58->61 


PS00005 


258->261 


PS00005 


268->271 


PS00005 


323->326 


PS00005 


341->344 


PS00005 


608->611 


PS00005 


637->640 


PS00006 


64->68 


PS00006 


137->141 



ASN_GLYCOSYLATION 

GLYCOSAMINOGLYCAN 

CAMP_PHOS PHO_S I TE 

CAMP_PHOSPHO_S ITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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DKFZphtes3_15jl8 



group: testes derived 

DKFZphtes3_15jl8 encodes a novel 148 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

complete cDNA, complete cds, few EST hits 
Sequenced by GBF 
Locus : unknown 
insert length: 905 bp 

Poly A stretch at pos . 839, polyadenylation signal at pos . 815 



1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 

51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGGAGGG CACAGCTTGT 

101 CCAGCTGGGA TGTTTGGGTG CCCTGTGAGA TGCCCCAAGC CACCAACCCA 

151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 

201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGCTTAG AATCTCTTGG 

251 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 

301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 

351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 

401 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 

451 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 

501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 

551 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 

601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 

651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 

701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 

751 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 

801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 

851 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACGCGAAAA 

901 AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 553 bp; peptide length: 148 
Category: putative protein 



1 MFGCPVRCPK PPTQLISGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_15j 18, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15jl8, frame 2 
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Report for DKFZphtes3__15jl8 .2 



[LENGTH) 

[MW] 

[pi] 

[PROSITE] 
[PROSITE] 
CKW] 



148 

15665.78 
8.91 

MYRISTYL 3 
CK2_PHOS PHO_S ITE 
Irregular 



SEQ 
PRD 



MFGCPVRCPKPPTQLI SGEASAARLPAWRDVLQQPGVGGEGGLRI SWQGAPKSRVRPAFI 
cccccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 



SEQ 
PRD 



SPVPFTVLQSQHYHPFS EG VGTQV EC LT P VL RLES DMARTA PHPSSLHP FP AW D S S S PVH 
cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 



SEQ 
PRD 



CGAPLPSAHGGFPRARAEGSWSQPGAGS 
cccccccccccccccccccccccccccc 



Prosite for DKFZphtes3_15jl8.2 

PS00006 82->86 CK2_PHOSPHO_SITE PDOC00006 

PS00008 38->44 MYRISTYL PDOC00008 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 49->55 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_15 j 18 . 2 ) 
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DKFZphtes3_15j3 



group: nucleic acid management 

DKFZphtes3_15j3 encodes a novel 74 3 amino acid protein with similarity to proteins with 
unknown function. 

The novel protein contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR276c, a ribonuclease H of S. cerevisiae. Thus, the protein 
seems to a new RNA-modifi eating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations. 

"44M2.3"; product, differences to genmodel, similarity to ribonuclease 
H 

complete cDNA, complete cds, EST hits 
YGR276C =* ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

Locus: /map*"16pll . 2" 

Insert length: 2695 bp 

Poly A stretch at pos. 2601, polyadenylation signal at pos. 2579 



1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 
51 GCCCGTCTTC TCTTCTGCGT TTCCCGGGCT AGGGGGCGTG GGGAGTGGTT 
101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 
151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 
201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 
251 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 
301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 
351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 
401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 
451 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 
501 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 
551 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 
601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 
651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 
701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 
751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 
801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 
851 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 
901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 
951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 
1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 
1051 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 
1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 
1151 AT AC AT CC AT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 
1201 CAGAAGATTT AAGCTCAAGT TCTTAGCCAA AGTTATTTTG GGGAAGGATA 
1251 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 
1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 
1351 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 
1401 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 
1451 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 
1501 GGAGACAGAT GCTGGTGAAC TTCCATCTTC CAGAAATTGT CAAACTATTA 
1551 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 
1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 
1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 
1701 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 
1751 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 
1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 
1851 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 
1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 
1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 
2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 
2051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 
2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 
2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 
2201 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 
2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 
2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC TCATGGTTCA 
2351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 
2401 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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2451 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 

2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 

2551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 

2601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 2 



ORF from 188 bp to 2416 bp; peptide length: 743 
Category: similarity to known protein 



1 MEPEREGTER HPRKVRESRQ APNKLVGAAE AMKAGWDLEE SQPEAKKARL 
51 STILFTDNCE VTHDQLCELL KYAVLGKSNV PKPSWCQLFH QNHLNNVWF 
101 VLQGMSQLHF YRFYLEFGCL RKAFRHKFRL PPPSSDFLAD VVGLQTEQRA 
151 GDLPKTMEGP LPSNAKAAIN LQDDPIIQKY GSKKVGLTRC LLTKEEMRTF 
201 HFPLQGFPDC ENFLLTKCNG SIADNSPLFG LDCEMCLTSK GRELTRISLV 
251 AEGGCCVMDE LVKPENKILD YLTSFSGITK KILNPVTTKL KDVQRQLKAL 
301 LPPDAVLVGH SLDLDLRALK MIHPYVIDTS LLYVREQGRR FKLKFLAKVI 
351 LGKDIQCPDR LGHDATEDAR TILELARYFL KHGPKKIAEL NLEALANHQE 
401 IQAAGQEPKN TAEVLQHPNT SVLECLDSVG QKLLFLTRET DAGELPSSRN 
4 51 CQTIKCLSNK EVLEQARVEI PLFPFSIVQF SFKAFSPVLT EEMNKRMRIK 
501 WTEISTVYAG PFSKNCNLRA LKRLFKSFGP VQSMTFVLET RQVQRPVTEL 
551 TLDCDTLVNE LEGDSENQGS IYLSGVSETF KEQLLQEPRL FLGLEAVILP 
601 KDLKSGKQKK YCFLKFKSFG SAQQALNILT GKDWKLKGRH ALTPRHLHAW 
651 LRGLPPESTR LPGLRWPPP FEQEALQTLK LDHPKIAAWR WSRKIGKLYN 
701 SLCPGTLCLI LLPGTKSTHG SLSGLGLMGI KEEEESAGPG LCS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15j3, frame 2 

TREMBL:AC004381_4 gene: "44M2.3"; product: "Unknown gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence., 
N *> 2, Score = 1827, P = 2.1e-284 

TREMBL:AF016430_4 gene: "C05C8.5"; Caenorhabditis elegans cosmid 
C05C8., N = 2, Score - 370, P = 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharorayces 
cerevisiae), N =* 2, Score « 334, P = 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPAC637 . 09"; product: "putative 
exonuclease"; S.pombe chromosome I cosmid c637., N ■» 3, Score = 326, P 
- 2.8e-27 

>TREMBL:AC004 381_4 gene: "44M2.3"; product: "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2 , complete sequence. 
Length = 547 

HSPs: 

Score =• 1827 (274.1 bits), Expect =» 2.1e-284, Sum P(2) = 2.1e-284 
Identities = 358/373 (95%), Positives = 358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWGLQTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 224 

AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct: 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query: 


225 


Sbjct: 


121 


Query: 


270 


Sbjct: 


181 


Query: 


330 


Sbjct: 


241 


Query: 


390 


Sbjct: 


301 


Query: 


450 


Sbjct: 


361 


Score 


= 929 


Identities ■ 


Query: 


538 


Sbjct: 


368 


Query: 


598 


Sbjct: 


428 


Query: 


658 


Sbjct: 


488 



NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 269 

NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 
NSPLFGLDCEMARTTFNFSIGVLQAECLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 180 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 329 
DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 
DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 240 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 389 
SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 
SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 300 

LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 44 9 
LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 
LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 360 

NCQTIKCLSNKEV 4 62 
NCQTIKCLSNKEV 
NCQTIKCLSNKEV 373 

(139.4 bits), Expect = 2.1e-284, Sum P(2) « 2.1e-284 
- 175/179 (97%), Positives = 177/179 (98%) 

LETRQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 597 
L ++VQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 
LSNKEVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 427 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 657 
ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 
ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 487 

STRLPGLRVVPPFFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 716 
STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 
STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 546 



Pedant information for DKFZphtes3_15j 3, frame 2 
Report for DKFZphtes3_15j 3 . 2 



[LENGTH] 
[MW] 

[pD 

[HOMOLJ 
Chromosome 
[FUNCATJ 
[ FUNCAT ] 
[FUNCATJ 
YGL094CJ le 
[ FUNCAT 1 
cerevisiae, 
[FUNCAT] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
(PROSITE] 
[PFAM] 
[KW] 



743 

83536.58 
8.87 

TREMBL : AC00 4 3 8 1_4 gene: "44M2.3"; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2, complete sequence. 0.0 

01.03.16 polynucleotide degradation (S. cerevisiae, YGR276c] 4e-30 

99 unclassified proteins [S. cerevisiae, YLR107w] 3e-13 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 

10 

04.05.05 mrna processing (5' -end, 3*-end processing and mrna degradation) [S. 
YGL094C] le-10 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c] 2e-10 
MYRISTYL 5 
AMI DAT ION 1 
CK2_PHOSPHO_SITE 
TYR PHOSPHO SITE 
GLYCOSAM I NOGLYCAN 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
RNA recognition motif. 
Alpha_Beta 



8 

1 
1 

16 
2 

(aka RRM, RBD, or RNP domain) 



SEQ MEPEREGTERHPRKVRESRQAPNKLVGAAEAMKAGWDLEESQPEAKKARLSTILFTDNCE 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ VTHDQLCELLKYAVLGKSNVPKPSWCQLFHQNHLNNVVVFVLQGMSQLHFYRFYLEFGCL 

PRD eehhhhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

SEQ RKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSNAKAAINLQDDPI IQKY 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

SEQ GSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIADNSPLFGLDCEMCLTSK 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

SEQ GRELTRISLVAEGGCCVMDELVKPENKILDYLTSFSGITKKILNPVTTKLKDVQRQLKAL 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLRALKMIHPYVIDTSLLYVREQGRRFKLKFLAKVILGKDIQCPDR 

PRD hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTIKCLSNKEVLEQARVEI PLFPFSIVQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFS PVLTEEMNKRMRIKWTEISTVYAGPFSKNCNLRALKRLFKSFGPVQSMTFVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhhhheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGI KEEEESAGPGLCS 

PRD cccccccchhhhhhccccccccc 



Prosite for DKFZphtes3_15j3 . 2 



PS00001 


219- 


>223 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


419- 


>423 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


723- 


>727 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS0O005 


8 


I->11 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


182- 


>185 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


238- 


>241 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


279- 


>282 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


287- 


>290 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


447- 


■>450 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


453- 


•>456 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


458- 


■>461 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


481- 


■>484 


PKC PHOSPHO~SITE 


PDOC00005 


PS0OOO5 


579- 


■>582 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


605- 


>608 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


630- 


■>633 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


643- 


■>646 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


658- 


•>661 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


678- 


■>681 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


692- 


->695 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


41 


->45 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


■>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


221- 


•>225 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


371- 


>375 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


421- 


■>425 


CK2 PHOSPHO SITE 


PDOC00006 


PS0OO06 


458- 


■>4 62 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


579- 


■>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS0OO06 


630- 


■>634 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00007 


370- 


■>379 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


21 


->33 


MYRISTYL 


PDOC00008 


PS0O008 


186- 


■>192 


MYRISTYL 


PDOC00008 


PS0O0O8 


575- 


•>581 


MYRISTYL 


PDOC00008 


PS00008 


714- 


■>720 


MYRISTYL 


PDOC00008 


PS0O0O8 


720- 


■>726 


MYRISTYL 


PDOC00008 


PS00009 


337- 


■>341 


AMI DAT I ON 


PDOC00009 



Pfam for DKFZphtes3_15j3 .2 



HMM_NAME 
HMM 
Query 
HMM 

Query 



RNA recognition motif, (aka RRM, RBD, or RNP domain) 



571 



*IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
IY+ +++ +T +E+L + + F + + + +++D G+ + ++F +F++ 
IYLSGVS-ETFKEQLLQEPRLFLGLEAVILPKDLKSGKQKKYCFLKFKS 



EEDAe kAI deMNG . .meFmGRrlRV* 
+A+ A+ + G ++ GR + 
619 FGSAQQALNILTGKDWKLKGRHALT 



643 



618 



604 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_15kll 



group: signal transduction 

DKF2phtes3_15kll encodes a novel 958 amino acid protein C-terminal identical with human 
KIAA0781 protein and high similarity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 
serine/threonine protein kinase active-site signature. The related murine kinase was cloned 
from the myocardium of the developing heart. 

The new protein can find application in modulation of intracellular signal pathways dependent 
on this kinase. 

KIAA0781, 5' extension 

complete cDNA, complete cds, potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /map="ll" 
Insert length: 4868 bp 

Poly A stretch at pos. 4798, polyadenylation signal at pos. 4776 



1 GAGCAAGCGG AGCGGCCGTC 
51 CGCCCGCCCG CGCTCCTGTC 
101 TCATGGCGGA TGGCCCGAGG 
151 TTCTACGACA TCGAGGGCAC 
201 GCTGGGGCGG CACCGGATCA 
251 ATAAGTCTCA GCTGGATGCA 
301 CAAATAATGA AAATGTTAGA 
351 AATGGAGACC AAAAGTATGT 
401 GAGAAATTTT TGACTATCTT 
451 GCCAGGCGAA AATTCTGGCA 
501 TCGGAAGATT GTGCACCGTG 
551 ACAACATGAA TATCAAAATA 
601 AGTGGTGAAC TGCTGGCAAC 
651 AGAAGTCTTT GAAGGGCAGC 
701 GTATGGGAGT TGTTCTTTAT 
751 GGACCGACTC TTCCAATTTT 
801 GATTCCGTAT TTCATGTCAG 
851 TGGTCCTAGA CCCATCCAAA 
901 AAATGGATGC TCATAGAAGT 
951 AGAGCAAGAA AATGAGCCAT 
1001 GACTGATGCA CAGCCTTGGA 
1051 CAGAACAAGA GCTATAACCA 
1101 GCGCCTGAAA TCACATCGGA 
1151 GCCGCCAGCG TCGGCCTAGC 
1201 CAGACTGTGG GGCTCCCAGT 
1251 GCGATCTGCC CTCCTCCCCC 
1301 CAGCATCTGG CTGTCAGGCG 
1351 GACACTCCAA AGGTCAATGG 
1401 GGTGCGGAAG GGATGCCAGT 
1451 TTGACGAAGG GCTGGAGACA 
1501 GCCTTTGAGG CATTTCAGTC 
1551 GTCAGAAGTG ACCAATCAAC 
1601 TCTCCATGAA TGACAGCCCC 
1651 ATGGGGTCTG TTCAGAGGGA 
1701 TAAGGACATC ATGTTAGCCA 
1751 TCATAAGCCT GAGACCTACC 
1801 AAACGAGAGG TCCACAACAG 
1851 AGCATCAGAT ACCTCCCTCA 
1901 TTCAGAATCT GGCTAGAACC 
1951 TTGTTGTATG AACAAATAGG 
2001 GGCTCCTCAG CTCCAGGACC 
2051 CTCAGCAGCA GGAAAGCGTC 
2101 CTGTCCCCAC GGCAGAGCCT 
2151 GAAGCCCAGC CTTCTGTCAA 
2201 AAGAACCACC GCGGAGCCTT 
2251 CAGAAGCGAC TCTTTCTTCA 
2301 TCAGATGCAG ATAGCAGAGA 
2351 CCCTTCCCCG CCAGGAGACT 
2401 AGCCTGACCC AGCCCCTGAG 
2451 GCAATACAGC CCTTTCCTCA 
2501 TGCCCTCCAC TTCCGGTCCC 
2551 CAGCAGCAGC AGCCGCCACC 
2601 AGGAGCTGCC CCAGCCCCCT 



GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 
CGCCGTGTCT AGCAGCGGGG CCCAGCATGG 
CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 
GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 
CCAAGACGGA GGTGGCAATA AAAATAATCG 
GTGAACCTTG AGAAAATCTA CCGAGAAGTA 
CCACCCTCAC ATAATCAAAC TTTATCAGGT 
TGTACCTTGT GACAGAATAT GCCAAAAATG 
GCTAATCATG GCCGGTTAAA TGAGTCTGAA 
AATCCTGTCT GCTGTTGATT ATTGTCATGG 
ACCTCAAAGC TGAAAATCTC CTGCTGGATA 
GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 
ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 
AGTATGAAGG ACCACAGCTG GACATCTGGA 
GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 
GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 
AAGATTGCGA GCACCTTATC CGAAGGATGT 
CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 
TCCTGTCCAG AGACCTGTTC TCTATCCACA 
CCATCGGGGA GTTTAATGAG CAGGTTCTGC 
ATAGATCAGC AGAAAACCAT TGAGTCTTTG 
CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 
GCAGTTTCCC AGTGGAGCAG AGACTTGATG 
ACCATTGCTG AGCAAACAGT TGCCAAGGCA 
GACCATGCAT TCACCGAACA TGAGGCTGCT 
AGGCATCCAA CGTGGAGGCC TTTTCATTTC 
GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 
CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 
CACTGCCCAG CAACATGATG GAGACCTCCA 
GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 
CACACGCAGC GGGCAGAGAC GGCACACTCT 
TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 
TCCCTTGACA GTGTGGACTC TGAGTATGAT 
CCTGAACTTT CTGGAAGACA ACCCTTCCCT 
ATCAGCCTTC ACCCCGCATG ACATCTCCCT 
AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 
GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 
CCCAGGGAAT TGTAGCATTT AGACAACATC 
AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 
ACCGGAGGCA GACCCTAACC TGGCGCCGGC 
TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 
TCCACTCTCC CTGCCAGCGT GCATCCCCAG 
GGAGACCCAG TACCTGCAGC ACAGACTCCA 
AGGCCCAGAA CACCTGTCAG CTTTATTGCA 
GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 
GAAGCAGTCT CAACTGCAGG CCTATTTTAA 
GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 
CCACCGCCTT CTCAGCAGGC CCCACCGTTC 
CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 
GCCAGTACCA AGAGATGCAG CTTCAGCCCC 
CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 
GCCACCACCC CCTCCACCAC CACGACAGCC 
TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2651 CAAGCGCTGC TTCCCCTGCG 
2701 GTGGATGGAG CCCAGCAGAG 
2751 CCCAGGACTG CAAGAGGCCC 
2801 AGCTACCTGG ACTCTTTGAT 
2851 CACAACGGGT ATGTCCTGGT 
2901 GTCAGGTGAA GGAAGAGTGT 
2951 TTTAAAGCTT ATTTTCTTGC 
3001 CCAACTGGAA TCAGAGGGTC 
3051 TCTGCCCCAC CACAAAGTTT 
3101 GCTGAGGCTC CTGCCCTTCG 
3151 CTGACAAATG TGTTCCTAAG 
3201 TTACATCCGT TTATTATCAA 
3251 GTGCTATTGC ATATATATGG 
3301 TGAGCAACCA CATATTGCTA 
3351 AGATGCACAG GAAATAAAGG 
3401 TCTTAGCTGC TGATGCAAGT 
3451 GCATGAGCTG TGTTTCAGGG 
3501 GAAACCGCCT TCATCTCCAT 
3551 GTGCCCTGGG TTGCCGAGTG 
3601 ACACGTGGAA CTGACAGGAG 
3651 ACTCAAGAAC GCATCAAGAG 
3701 TTCCTGCAGT TTCTCGTGGA 
3751 GGGTACCTGT TGTCTCTTTT 
3801 ATATGTTGCT AGTAGTTTAT 
3851 AGGGCTTAGA GATTTTAAGG 
3901 GAAGTGGTAG TGCGGTGCCT 
3951 TTCTGTAGAA ACCAACAGTT 
4001 ACTTCAGAGT TTGTTTTCCA 
4051 TAAAGTTTTG ACTTGTAATG 
4101 GAAAGAACCA CAGATGTGTT 
4151 TATATTACTA ATAAAACTTA 
4201 AGTAAAAATT AGATGCTACA 
4251 ATAATTTGCC ATTTGGACAG 
4301 AATTCTAAAG ATGATCATTT 
4351 AGATGAATGT GTTAAGCACA 
4401 ACTAACTGAT GCTGCATCTA 
4451 GTAGTTAGCG TTCAGGCAGG 
4501 CTGGCCATGC GAGCCCAGCT 
4551 TGTTGCTGGC CAGAGACTGC 
4601 GATGCTTCGC AGAGGCACTG 
4651 AAGGGCAGTG TGGGGACTGT 
4701 AATCCAGGAA GAATGAATTA 
4751 CGTGCTTAAG ATTGATGATT 
4801 AAAAAAAAAA AAAAAAAGGG 
4851 CGCGTGAAAA AAAAAAAG 



CCAGACTATC CCACTCCCTG TCAGTATCCT 
CGACCTAACG GGGCCAGACT GTCCCAGAAG 
CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
TGTGAAATGC TAGACGCTGT GGATCCACAA 
GAATTAGTCT CAGCACAGGA ATTGAGGTGG 
ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
AAGACATTCA GACCCAGGTC TTATGCAGGA 
GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
GGGAAAAGGC AATATATTTT TCACTGAAGC 
CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
AAAGCTGTGC TTTGTCATTG AATCCTAAGT 
TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
GCCACTAAAT AACAGCTGGT ACTGACCCCA 
TCGGAAGCAG GTGACACACC CCTTCAGAAG 
TCAGAATATA CTCAGGACTC CAGAGGTGTC 
ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
CCGATGTAAT AACTACTTTG ACCTTACACT 
TGAGCTTTGT ATATTTGGAC AGTTTCATAT 
AC AT GAT AAA TGAACTTTTC TGTCCCATGT 
TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
TCCATTTATG TCAATGCTAA ATCCAAAGTC 
CCATGTGGGA ATCAGCATTC TTAATTTCGT 
AAATGTTCAA GTATTACAGC AATATTCAAA 
AACCATTTAA GCAGATCATC TGCCAAACAT 
ACCAACAGTT ACAATTCAGT CATCAAAGTA 
GCTAGCTAAC TGTATCCCTA GAAATGATGA 
TTAACATCCA GGTGTTACAA AGTCAGTGTT 
CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
CCTACCAACG TCGGTAACTT GAGCAGTCCC 
CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
CATTTTTGTG ATTTAATAAC ACACAGTGAA 
AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
TCGTGAAATA AAGAACATCA TTTCATTTAA 
CGGCCGCTCT AGAGGATCCA AGCTTACGTA 



BLAST Results 



Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score « 1605, P = 1.9e-66, identities = 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens mRNA for KIAA0781 protein, partial cds. 
Score = 10725, P = 0.0e+00, identities - 2145/2145 



Kedline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 2874 bp; peptide length: 959 
Category: known protein 



1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADGPR HLQRGPVRVG 
51 FYDIEGTLGK GNFAVVKLGR HRITKTEVAI KIIDKSQLDA VNLEKIVREV 
101 QIMKMLOHPH IIKLYQVMET KSMLYLVTEY AKNGEIFDYL ANHGRLNESE 
151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNMNIKI ADFGFGNFFK 
201 SGELLATWCG SPPYAAPEVF EGQQYEGPQL DIWSMGWLY VLVCGALPFD 
251 GPTLPILRQR VLEGRFRIPY FMSEDCEHLI RRMLVLDPSK RLTIAQIKEH 
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301 KWMLIEVPVQ RPVLYPQEQE NEPSIGEFNE QVLRLMHSLG IDQQKTIESL 
351 QNKSYNHFAA IYFLLVERLK SHRSSFPVEQ RLDGRQRRPS TIAEQTVAKA 
401 QTVGLPVTMH SPNMRLLRSA LLPQASNVEA FSFPASGCQA EAAFMEEECV 
451 DTPKVNGCLL DPVPPVLVRK GCQSLPSNMM ETSIDEGLET EGEAEEDPAH 
501 AFEAFQSTRS GQRRHTLSEV TNQLVVMPGA GKIFSMNDSP SLDSVDSEYD 
551 MGSVQRDLNF LEDNPSLKDI MLANQPSPRM TSPFISLRPT NPAMQALSSQ 
601 KREVHNRSPV SFREGRRASD TSLTQGIVAF RQHLQNLART KGILELNKVQ 
651 LLYEQIGPEA DPNLAPAAPQ LQDLASSCPQ EEVSQQQESV STLPASVHPQ 
701 LSPRQSLETQ YLQHRLQKPS LLSKAQNTCQ LYCKEPPRSL EQQLQEHRLQ 
751 QKRLFLQKQS QLQAYFNQMQ IAESSYPQPS QQLPLPRQET PPPSQQAPPF 
801 SLTQPLSPVL EPSSEQMQYS PFLSQYQEMQ LQPLPSTSGP RAAPPLPTQL 
851 QQQQPPPPPP PPPPRQPGAA PAPLQFSYQT CELPSAASPA PDYPTPCQYP 
901 VDGAQQSDLT GPDCPRSPGL QEAPSSYDPL ALSELPGLFD CEMLDAVDPQ 
951 HNGYVLVN 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_15kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15kll, frame 1 



Report for DKFZphtes3_15kll . 1 



01.05.04 regulation of carbohydrate utilization 
[S 



11.01 stress response 
30.03 organization of cytoplasm 
98 classification not yet clear-cut 
03.25 cytokinesis (S. 



[LENGTH] 926 

[MW] 103915.77 

ipl] 5.70 

[HOMOL] TREMBL: AB018324_1 gene: 

mRNA for KIAA0781 protein, partial cds 
[FONCATJ 
8e-76 
[FUNCAT] 
t FUNCAT J 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-56 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YPL031C] le-23 
[FUNCAT] 
le-23 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

[S. 

[FUNCAT] 
[ FUNCAT ) 
[FUNCAT] 
3e-19 
(FUNCAT] 
[FUNCAT] 
(FUNCAT] 
4e-18 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
YNLl83c] 2e-14 



w KIAA0781 r ' 
0.0 



product: "KIAA0781 protein"; Homo sapiens 



[S. cerevisiae, YDR477w] 



cerevisiae, YDR477w] 8e-76 
[S. cerevisiae, 
[S. cerevisiae, 
cerevisiae, YDR507c] 3e-56 



YDR477w] 8e-76 
YCL024w] 4e-58 



03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 

30.02 organization of plasma membrane [S. cerevisiae, YDR122w] le-53 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw) 3e-53 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 3e-53 
99 unclassified proteins [S. cerevisiae, YPLl41c] 5e-51 

03.19 recombination and dna repair (S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 3e-42 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 3e-42 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPLl53c] 3e-42 

03.01 cell growth [S. cerevisiae, YFR014c] 5e-42 

03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 2e-34 
03.10 sporulation and germination [S. cerevisiae, YGLl80w] le-27 
08.13 vacuolar transport [S. cerevisiae, YGLl80w] le-27 

06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGL180w] le-27 
10.02.11 key kinases [S. cerevisiae, YBLlOSc] 3e-26 
04.99 other transcription activities [S. cerevisiae, YERl29w] 3e-26 
02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YPL031c] 



04.05.01.04 transcriptional control [S. cerevisiae, YPL031c] le-23 
03.13 meiosis [S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c] 8e-21 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YHL007c] 8e-21 

09.01 biogenesis of cell wall [S. cerevisiae, YPL140c] 2e-20 

10.03.11 key kinases [S. cerevisiae, YLR113w] 7e-20 

04.05.01.01 general transcription activities [S. cerevisiae, YDLlOBw] 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 2e-18 
10.04.11 key kinases [S. cerevisiae, YLR362w) 3e-18 

04.03.99 other trna-transcription activities [S. cerevisiae, YOR061w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) * [S. cerevisiae, YFL033c] 4e-17 

05.07 translational control [S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 
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[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL183C] 

2e-14 

(FUNCAT J 09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020C] 5e-14 

[FUNCAT] c energy conversion [M. genitalium, MG109J 2e-12 

[ FUNCAT J 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YBR097w) le-10 

( FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YBR097w] 

le-10 

[ FUNCAT ] 30.08 organization of golgi [S. cerevisiae, YBR097w] le-10 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 

le-10 

[FUNCAT] 10.04.99 other nutritional-response activities [S. cerevisiae, YJR059w] 

4e-09 

[FUNCAT) 01.06.10 regulation . of lipid, fatty-acid and sterol biosynthesis [S. 
cerevisiae, YHR079c] le-07 

[FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

le-07 

I FUNCAT) 08.19 cellular import [S. cerevisiae, YNLl54c] 2e-04 

[BLOCKS] BL00415A Synapsins proteins 

[BLOCKS] BL00239B Receptor tyrosine kinase class II proteins 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[SCOP] dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-78 

[SCOP] dlwfc 5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) le-81 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 5e-89 

[SCOP] dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har 5e-86 

[SCOP) dlphk 5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 3e-80 

[SCOP] dlirk 5.1.1.2.4 insulin receptor [Human (Homo sapiens) 6e-70 

[SCOP] dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-95 

[SCOP] dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn 7e-71 

[SCOP] dlydse_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 2e-96 

[SCOP] dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 2e-72 

[SCOP] dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 5e-97 

[SCOP) d2hckb3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 2e-68 

[SCOP) dlcsn 5.1.1.1.11 Casein kinase-1, CK1 (Schizosaccharomyces pombe 3e-53 

[SCOP] dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) 3e-78 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CKl [rat (Rattus norvegicus) le-58 

[EC] 2.7.1.117 Myosin-light-chain kinase 3e-49 

[EC] 2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 4e-78 

[EC] 2.7.1.38 Phosphorylase kinase 3e-41 

[EC] 2.7.1.37 Protein kinase 7e-45 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

[EC] 2.7.1.128 [Acetyl-CoA carboxylase] kinase 4e-78 

[PIRKW] phosphotransferase 3e-93 

[PIRKW] nucleus 2e-74 ' 

[PIRKW] calcium 2e-40 

[PIRKW] transferase 3e-33 

[PIRKW] duplication 2e-32 

[PIRKW] tandem repeat 7e-45 

[PIRKW] phorbol ester binding 4e-33 

[PIRKW] zinc 4e-33 

[PIRKW] ion transport le-32 

[PIRKW] cell cycle control le-45 

[PIRKW] serine/threonine-specific protein kinase 2e-97 

[PIRKW] oncogene le-34 

[PIRKW] phospholipid binding 2e-32 

[PIRKW] autophosphorylation 2e-74 

[PIRKW] brain 6e-36 

[PIRKW] heterotetramer 8e-38 

[PIRKW] mitosi3 le-45 

[PIRKW] polymer 5e-41 

[PIRKW] magnesium 6e-80 

[PIRKW] ATP 2e-97 

[PIRKW] polyprotein le-34 

[PIRKW] alternative initiators 2e-31 

[PIRKW] phosphoprotein 2e-74 

[PIRKW] apoptosis 8e-38 

[PIRKW] cGMP binding 4e-33 

[PIRKW] glycoprotein 3e-36 

[PIRKW] skeletal muscle 8e-38 

[PIRKW] protein kinase 2e-50 

[PIRKW] testis 5e-41 

[PIRKW] cAMP binding 8e-38 

[PIRKW] transforming protein 4e-33 

[PIRKW] purine nucleotide binding 7e-52 

[PIRKW] calcium binding 7e-45 

[PIRKW] alternative splicing 5e-42 

[PIRKW] P-loop 7e-52 

[PIRKW] lipoprotein 8e-38 

[PIRKW] proto-oncogene 4e-33 

[PIRKW] segmentation le-34 

[PIRKW] core protein le-34 
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PIRKW] 


muscle 8e-38 


PIRKW] 


myristylation 8e-38 


PIRKW J 


EF hand 7e-45 


PIRKW] 


cell division 3e-49 


PIRKW] 


homodimer le-32 


PIRKW] 


calmodulin binding 5e-42 


SUPFAM] 


ribosomal protein S6 kinase II le-34 


SUPFAM] 


calcium-dependent protein kinase 7e-45 


SUPFAM] 


AMP-activated protein kinase 6e-80 


SUPFAM] 


protein kinase akt 3e-36 


SUPFAM] 


protein kinase SPKl 7e-41 


SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 8e-99 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 5e-42 


SUPFAM] 


calmodulin repeat homology 7e-45 


SUPFAM] 


CAMP receptor protein cyclic nucleotide-binding domain homology 3e-33 


SUPFAM] 


protein kinase DUNl 6e-36 


SUPFAM] 


protein kinase C zeta 4e-33 


SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic chain 2e-34 


SUPFAM] 


death-associated protein kinase 8e-38 


SUPFAM] 


pleckstrin repeat homology 3e— 36 


SUPFAM] 


ankyrin repeat homology 8e-38 


SUPFAM] 


protein kinase homology 8e-99 


SUPFAM] 


Ca2+/calmodul*in~dependent protein kinase II 6e — 38 


SUPFAM] 


protein kinase C zinc-binding repeat homology 4e-33 


SUPFAM] 


protein kinase C delta 2e-32 


SUPFAM] 


cGMP~dependent protein kinase 3e~33 


SUPFAM] 


protein kinase cdrl le—45 


SUPFAM] 


kinase-related transforming protein 2e — 50 


SUPFAM] 


Ca2+/calmodulin — dependent protein kinase I 8e— 42 


SUPFAM] 


kinase interaction domain homology 7e~41 


SUPFAM] 


gag-akt polyprotein le-34 


PROSITE] 


PROTEIN KINASE ATP 1 


PROSITE] 


MYRISTYL 3 


PROSITE] 


AMIDATION 2 


PROSITE] 


CAMP PHOSPHO SITE 4 


PROSITE] 


CK2 PHOSPHO SITE 15 


PROSITE] 


TYR PHOSPHO SITE 2 


PROSITE) 


PKC PHOSPHO SITE 10 


PROSITE] 


ASN~GLYCOSYLATION 2 


PROSITE] 


PROTEIN_KINASE_ST 1 


PFAM] 


Eukaryotic protein kinase domain 


KW] 


Irregular 


KW] 


3D 


KW] 


LOW_COMPLEXITY 12.31 % 



SEQ MVMADGPRHLQRGPVRVGFYDIEGTLGKGNFAWKLGRHRITKTEVAIKIIDKSQLDAVN 

SEG 

IctpE EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 

SEQ LEKIYREVQIMKMLDHPHIIKLYQVMETKSMLYLVTEYAKNGEIFDYLANHGRLNESEAR 

SEG 

IctpE HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHH 

SEQ RKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFFKSGELLATWCGSP 

SEG 

IctpE HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC-CCCCCG 

SEQ PYAAPEVFEGQQYEGPQLDIWSMGVVLYVLVCGALPFDGPTLPILRQRVLEGRFRI PYFM 

SEG 

IctpE GGCCHHHHHCCCBC-HHHHHHHHHHHHHHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTT 

SEQ SEDCEHLIRRMLVLDPSKRLTIAQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV 

SEG 

IctpE CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG 

SEQ LRLMHSLGIDQQKTIESLQNKSYNHFAAIYFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

SEG 

IctpE 

SEQ AEQTVAKAQTVGLPVTMHSPNMRLLRSALLPQASNVEAFSFPASGCQAEAAFMEEECVDT 

SEG 

IctpE 

SEQ PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 

SEG xxxxxxxxxxx 

IctpE 

SEQ RRHTLSEVTNQLVVMPGAGKI FSMNDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 

SEG 

IctpE 



609 



WO 01/12659 



PCT/IB00/01496 



,SEQ 
SEG 
ICtpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 



ANQPSPRMTSPFISLRPTNPAMQALSSQKREVHNRSPVSFREGRRASDTSLTQGIVAFRQ 



HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 
xxxxxxxxxxxxxxxx. . . .xxxxxxxxxxxx . 



LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 
xxxxxxxxxxxxx 



RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 
xxxxxxxxxxx xxxxxxxxxxxxxxx 



SSEQMQYSPFLSQYQEMQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 



PLQFSYQTCELPSAASPAPDYPTPCQYPVDGAQQSDLTGPDCPRSPGLQEAPSSYDPLAL 

XXX 



SELPGLFDCEMLDAVDPQHNGYVLVN 



Prosite for DKFZphtes3_15kll. 1 



PS00001 


115- 


->119 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


320- 


->324 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


258- 


->262 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


355- 


->359 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


481->485 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


584- 


->588 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


257- 


->260 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


339- 


->342 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


420->423 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


475- 


->478 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


534- 


->537 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


545- 


->548 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


554- 


->557 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


567- 


->570 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


579- 


->582 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


670- 


■>673 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


42->46 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


128- 


■>132 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


■>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


359- 


>363 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


■>398 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


•>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


458- 


■>462 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


484- 


>488 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


503- 


>507 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


515- 


■>519 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


534- 


■>538 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


■>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


878- 


>882 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


893- 


■>897 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


672- 


■>680 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


100- 


■>108 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


372- 


->378 


MYRISTYL 


PDOC00008 


PS00008 


871- 


■>877 


MYRISTYL 


PDOC00008 


PS00008 


905- 


■>911 


MYRISTYL 


PDOC00008 


PS00009 


134- 


■>138 


AMIDATION 


PDOC00009 


PS00009 


582- 


■>586 


AMIDATION 


PDOC00009 


PS00107 


26->50 


PROTEIN KINASE ATP 


PDOC00100 


PS00108 


138- 


■>151 


PROTEIN KINASE~ST 


PDOC00100 



HMM NAME 



Pfam for DKFZphtes3_15kll . 1 
Eukaryotic protein kinase domain 
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HMM *YeigRiIGeGsFGtVYkCiWr.TGeIVAIKIIkkrsms FlREI 

Y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAWKLGRHRITKTEVAIKIIDKSQLDAVNLEKIYREV 

HMM qlMRrLnHPNIIRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEw 

QIM++L+HP+II++Y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 
Query 69 QIMKMLDHPHIIKLYQVME-TKSMLYLVTEYAKNGEIFDYLANHGRLNES 

HMM elrf IMyQILrGMeYLHSMgllHRDLKPENILIDeNgqIKIcDFGLARqM 

E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFF 

HMM nnYerMttf CGTPWYMMAPEVIIrag . nyYttkVDMWSFGCILWEMMTGep 

+++E++ T CG+P+Y APEV +G +Y +++ D+WS+G++L+ +++G + 
Query 168 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGVVLYVLVCGAL 

HMM PFyddnMemlmrliqrfrrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF++ ++ + + +++ R+++++ +SE++ +++R+++ +DP+KR+T+ QI 
Query 216 PFDGPTLPILRQRVLEGRFRIPYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 



PCT/IB00/01496 

68 
117 
167 
215 
265 
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DKFZphtes3_17fl0 

group: testes derived 

DKFZphtes3_15jl8 encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

similarity to neurofilament proteins 

Sequenced by GBF 

Locus : unknown 

Insert length: 2533 bp 

Poly A stretch at pos. 2507, no polyadenylation signal found 

1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 
51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 

101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 

151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 

201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 

251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 

301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 

351 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 

401 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 

451 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 

501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 

551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 

601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 

651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 

701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 

751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 

801 GCACAAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 

851 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 

901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 

951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 
1001 TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 
1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 
1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 
1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 
1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 
12*51 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 
1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 
1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 
1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 
1451 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 
1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 
1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 
1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 
1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 
1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 
1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 
1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 
1851 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 
1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 
1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 
2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 
2051 CCTTAAAGAT TTGTCTAATA CCAATGATGG ACAGGCTCCC ACTCTTGAAA 
2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 
2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 
2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 
2251 ATTAAAGGGG ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 
2301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 
2351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 
2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 
24 51 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 
2501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 18 bp to 2147 bp? peptide length: 710 
Category: similarity to known protein 
Classification: unclassified 



1 MDRSQQTSRT GYWTMMNIPP VEKVDKEQQT YFSESEIVVI SRPDSSSTKS 
51 KEDALKHKSS GKIFASEHPE FQPATNSNEE IGQKNISRTS FTQETKKGPP 
101 VLLEDELREE VTVPVVQEGS AVKKVASAEI EPPSTEKFPA KIQPPLVEEA 
151 TAKAEPRPAE ETHVQVQPST EETPDAEAAT AVAENSVKVQ PPPAEEAPLV 
201 EFPAEIQPPS AEESPSVELL AEILPPSAEE SPSEEPPAEI LPPPAEKSPS 
251 VELLGEIRSP SAQKAPIEVQ PLPAEGALEE APAKVEPPTV EETLAEVQPL 
301 LPEEAPREEA RELQLSTAME TPAEEAPTEF QSPLPKETTA EEASAEIQLL 
351 AATEPPADET PAEARSPLSE ETSAEEAHAE VQSPLAEETT AEEASAEIQL 
401 LAAIEAPADE TPAEAQSPLS EETSAEEAPA EVQSPSAKGV SIEEAPLELQ 
451 PPSGEETTAE EASAAIQLLA ATEASAEEAP AEVQPPPAEE APAEVQPPPA 
501 EEAPAEVQPP PAEEAPAEVQ PPPAEEAPAE VQPPPAEEAP AEVQPPPAEE 
551 APSEVQPPPA EEAPAEVQSL PAEETPIEET LAAVHSPPAD DVPAEEASVD 
601 KHSPPADLLL TEEFPIGEAS AEVSPPPSEQ TPEDEALVEN VSTEFQSPQV 
651 AGIPAVKLGS VVLEGEAKFE EVSKINSVLK DLSNTNDGQA PTLEIESVFH 
701 IELKQRPPEL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17f 10, frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N = 1, Score = 480, P 
= 7.4e-43 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
« 1, Score = 475, P = le-42 



>PIR:A37221 neurofilament triplet H protein 
Length » 1,072 



rat 



HSPs: 

Score » 480 (72.0 bits), Expect - 7.4e-43, P - 7.4e-43 
Identities - .185/622 (29%), Positives « 320/622 (51%) 



Query: 


33 


Sbjct: 


436 


Query: 


93 


Sbjct: 


496 


Query: 


153 


Sbjct: 


555 


Query: 


212 


Sbjct: 


610 


Query: 


269 


Sbjct: 


670 


Query: 


328 


Sbjct: 


722 


Query: 


384 



SE +1 V+ + + 



+E 



+ + + ++ 



E Q 



E G 



+ TS 



+A+ PAE 



+SP 



V+ PAE 



++EE 



V+ P+T ++P 



K +■ AE + P+ K PA+++ P ++ A 



+ A A++ +V+ P 



++P 



PAE 



P+ 



AE 



P++ +SP E + PAE 



KSP+ V+ E +SP+ K+P+ 



++P +V+ P ++ +E + ++P E A+ 

--KSPVEVKSPASVKSPSEAKSPAGAKSPAE-AKS— 



++PAE ++P 
-PVVAKSPAEAKSP 721 



383 



E + P ++ AE S 



A + PA+ ++PAEA+SP+ E S E+A + V+ 
- AEAKS PAEAKS PAEAKS PV-EVKS PEKAKSPVKEGAK 775 



384 PLAEETTAEEASAE I QLLAAIEAPAD-ETPAEAQSPLSEET-SAEEAPA- EVQSPSAKGV 4 40 



613 
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LAE + E+A + ++ 1+ PA+ ++P +A+SP+ EE S E+A +V+SP AK 
Sbjct: 77 6 SLAEAKSPEKAKSPVK— EEIKPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTP 833 



Query. 


441 


SIEEA--PLELQPPSGEETTA-EEASAAIQLLAATEASA EEAPAEVQPPPAEEAPAE 


494 




+ EEA P +++ P ++ A EEA + + TE A EE + V+ A+E P + 




Sbjct: 


834 


AKEEAKRPADIRSPEQVKSPAKEEAKSPEKEETRTEKVAPKKEEVKSPVEEVKAKEPPKK 


893 


Query: 


495 


VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 


553 






V+ P EV+ +EAP E Q P AEE + P +++P E + EEA 




Sbjct: 


894 


VEEEKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP — KDSPGEAKK EEAKE 


948 


Query: 


554 


EVQPPPAEEAPAEV QSLP AEETPIEETL — AAVHSPPADDVPAEEASVD-KHS 


603 






+ P EE PA++ * ++ P AE+ +E + P ++VPA D K 




Sbjct: 


949 


KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 


1008 


Query: 


604 


PPADLLLTEEFPIGEASAEVSPP— PSEQT-PEDEALVENVSTEFQSPQ 649 






+ EE P +A A+ P E + P+ E ++ ST+ + Q 




Sbjct: 


1009 


KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDQKDSQ 1057 




Score 


= 473 


(71.0 bits), Expect - 4.8e-42, P - 4.8e-42 




Identities = 184/628 (29%), Positives - 310/628 (49%) 




Query: 


18 


IPPVEKVDKEQQTYFSESEIVVISRP DSSSTKSKEDALKHKSSGKIFASEHPEFQPA 


74 




I VEK +KE ++E + ++ + E+ + + G+ A+ P + A 




Sbjct: 


440 


IKVVEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 


499 


Query: 


75 


TNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGSAVKKVASAEIEPPS 


134 






+ +E + + + + KP E+ E P + AK + AE + P+ 




Sbjct: 


500 


ASPE KET -KS PVKEEAKS PAEAKS PA EAKSPAEAKSPAEVKSPAEVK-SPAEAKSPA 


554 


Query: 


135 


TEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQ-PSTEETPDAEAATAVAENSVKVQPPP 


193 




K PA+++ P ++ A+A+ ++ +V+ P+T ++P + A A++ +V+ P 




Sbjct: 


555 


EAKSPAEVKSPATVKS PAEAKS PAEAKSPAEVKSPATVKSPGEAKSPAEAKSPAEVKSPV 


614 


Query: 


194 


AEEAPL-VEFPAEIQPPSAEESPS-VELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS 


250 






++P + PA ++ P +SP+ + AE+ P+ +SP E + PAE+ P KSP+ 




Sbjct: 


615 


EAKSPAEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPA 


674 


Query: 


251 


-VELLGEIRSPSAQKAPIEVQ-PLPAEGALE-EAPAKVEPPTVEETLAEVQPLLPEEAPR 


307 






+ E++SP++ K+P E + P A+ E ++P + P ++ AE + P ++P 




Sbjct: 


675 


EAKSPVEVKSPASVKSPSEAKSPAGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPA 


734 


Query: 


308 


EEA RELQLS TAME — TPAE-EAPTEFQSP LP-KE TTAEEASAEIQLLAATE — 


354 






E + + E +PAE ++P E +SP P KE + AE S E E 




Sbjct: 


735 


EARS PAEAKS PAEAKS PAEAKS PVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEI 


794 


Query: 


355 


-PPAD-ETPAEARSPLSEET-SAEEAHA-EVQSPLAEETTAEEAS — AEIQLLAAIEAPA 


408 






PPA+ ++P +A+SP+ EE S E+A +V+SP A+ EEA A+I+ +++PA 




Sbjct: 


795 


KPPAEVKSPEKAKS PMKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPA 


854 


Query: 


409 


DETPAEAQSPLSEETSAEE-APA--EVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAA 


4 65 




E EA+SP EET E+ AP EV+SP +EE + +PP E EE + A 




Sbjct: 


855 


KE EAKSPEKEETRTEKVAPKKEEVKSP VEEVKAK-EPPKKVE EEKTPA 


901 


Query: 


466 


IQLLAATEASAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAE 


525 






E+ +EAP E Q P AEE + P +++P E + A+E A P E 




Sbjct: 


902 


TPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP--KDSPGEAKKEEAKEKKAAA PEE 


956 


Query: 


526 


EAPAEV QPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETL 


581 






E PA++ + P E+A P++ PSE + P EE PA + +E E+ 




Sbjct: 


957 


ETPAKLGVKEEAKPKEKAEDAKAKEPSK — PSEKEKPKKEEVPAAPEKKDTKEEKTTESK 


1014 


Query: 


582 


AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 636 






P EE DK P TE+ ++ + PSE+ PED+A 




Sbjct: 


1015 


KPEEKPKMQAKAKEE— DKGLPQEPSKPKTEKAEKSSSTDQKDSQPSEKAPEDKA 1067 


Score 


- 421 


(63.2 bits), Expect » 3.7e-36, P - 3.7e-36 





Identities = 162/540 (30%), Positives = 275/540 (50%) 

Query: 135 TEKFPAKIQPPLVEEATAKAEPR PAEETHVQVQPSTEETPDAEAATAVAENSVKV 189 

TE P KI P + K+E + +E+ V V+ TEE E T E + 

Sbjct: 419 TEGLP-KI-PSMSTHIKVKSEEKIKWEKSEKETVIVEEQTEEIQVTEEVTE— EEDKEA 474 

Query: 190 QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE--SPSE-EPPAEILPPPAE 24 6 

Q EEA A P AEE+ S E E P EE SP+E + PAE P 

Sbjct: 475 QGEEEEEAEEGGEEAATTS PPAEEAAS PE- -KETKS PVKEEAKSPAEAKS PAEAKS PAEA 532 

Query: 247 KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

KSP+ E++SP+ K+P E + PAE ++PA+V+ P ++ AE + ++P 
Sbjct: 533 KSPA EVKS PAEVKS PAEAKS- PAEA KS PAEVKSP AT VKS PAEAKS PAEAKS P 583 



614 
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Query: 


307 


REEARELQLSTAME--TPAE-EAPTEFQSPLPKETTAEEAS-AEIQLLAATEPPAD-ETP 


361 






F 4- 4. P ADBC axD IT +CD+ 4-4- AF ^ A +4- + PA+ + + P 




Sbjct: 


584 


AEVKSPATVKSPGEAKSPAEAKSPAEVKSPVEAKSPAEAKSPASVKSPGEAKSPAEAKSP 


643 


Query: 


362 


AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPL 


419 






BP 4. CD 4. 4.4- E* 4.4. finUCDJ. 4-4. AP A 4- + 4> 4 + 4P& 4.4.D4E&4QP 




Sbjct: 


644 


AEVKS PATVKSPVEAKS PAEVKS PVT VKS PAE- AKSPVE VKS PAS VKS PSEAKSP- 


697 


Query : 


420 


SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 


478 




■ x _l r> T\ r* inn tiv j_ 1 1 n c iddi .lx hp C7l R 4- & B4. 
+ r+FAL tsf AJ\ + + + r £1 +fr+ tt hcj 0 H ft ~ ft ftT 




Sbjct: 


698 


AGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPAEAKSPAE AKSPAEAK- 


749 


Query: 


479 


APAEVQPPPAEEAPAEVQPPPAEEAP — AEVQPPPAEEAPA — EVQPPPAEEAPAEVQPP 


534 






xDhf j. D X.4.D 4. X D IT & BC 4. D 4-4-P TT + 4.PP 4-4-P 4- + P 




Sbjct: 


750 


SPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 


809 


Query: 


535 


PAEEAPAEVQPPPAEEAPSEVQPPPAEEA— PAEVQSLPAEETPIEETLAAVHSPPADDV 


592 






ppB 4, 4. 4. C* 4- D PPA PB + ++C! 4-4-P 4-P CD 4-4- 




Sbjct: 


810 


MKEEAKS PEKAKTLDVKS PE AKT PAKEEAKR P ADI RS PEQVKS PAKEE AKS PEKEET 


866 


Query: 


593 


PAEEASVDKHS — PPADLLLTEEF r IGEAbAEVSrrroEQTrbDiJiLV 






E++K P+++EP + E P + +T E++ EQP+ 




Sbjct: 


867 


RTEKVAPKKEEVKSPVEEVKAKEPP— KKVEEEKTPATPKTEVKESKKDEAPKEAQKPKA 


924 


Query: 


651 


AGIPAVKLGSVVLEGEAKFEEVSK 674 








+ GEAK EE + 




Sbjct: 


925 


EEKEPLTEKPKDSPGEAKKEEAKE 948 




Score 


= 406 


(60.9 bits), Expect - 1.7e-34, P = 1.7e-34 




Identities = 


= 123/390 (31%), Positives - 213/390 (54%) 




Query: 


308 


EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA EA 


364 






E+ E+Q++ E EE E Q +E AEE E AT PPA+E + E 




Sbjct: 


455 


EQTEEIQVT EEVTEEEDKEAQGE — EEEEAEEGGEEA ATTSPPAEEAASPEKET 


506 


Query: 


365 


RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 


422 






+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 




Sbjct: 


507 


KSPVKEEAKSP AEAKS PAEAKSPAEAKS PAEVKS PAEVKS PAEAKSPAEAKS PAEVK 


563 


Query: 


423 


TSAE-EAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 


480 






+ A ++PAE +SP+ AK + ++p ++ P GE + EA + ++ + £A ++P 




Sbjct: 


564 


S PATVKS PAEAKSPAEAKS PAEVKS PAT VKS P-GEAKS P AEAKS PAEVKS PVEA KS P 


619 


Query: 


481 


AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 


540 




AE + P + ++P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 




Sbjct: 


620 


AEAKS PAS VKS PGE AKS P AEAKS PAEVKS PATVKSPVEAKS PAEVKS P VTVKS P AEAKS P 


679 


Query: 


541 


AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPAD-DVPAEEASV 


599 






tiV+ P + T + f bL + r + + fAti tb a r m rtr £j H trtrH^ rMt S» 




Sbjct: 


680 


VEVKS PAS VKS PSEAKS PAGAKS PAE AKS PVV AKS PAE AKS P AE AK P PAE AK S PAEAKSP 


739 


Query: 


600 


DKHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLG 


659 




+ PA+ E ++ EV P ++P E ++++ E +SP+ A P VK 




Sbjct: 


740 


AEAKS P AEAKS PAE AKSPVE VKS PEKAKSPVKEG- AKS LA- EARS PEKAKSP-VK-E 


792 


Query: 


660 


SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 






+ E K E +K S +K+ + + + +A TL+++S 




Sbjct: 


793 


EIKPPAEVKSPEKAK— SPMKEEAKSPE-KAKTLDVKS 827 




Score 


- 255 


(38.3 bits), Expect = 5.5e-18, P = 5.5e-18 




Identities = 124/420 (29%), Positives = 199/420 (47%) 




Query: 


252 


ELLGEI RSPS AQKAPI EVQPLPA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 


306 




ELLG+I+ A +A + + A AL E A++E TV+ TL + 




Sbjct: 


236 


ELLGQIQGCGAAQAQAQAEARDALKCDVTSALREIRAQLEGHTVQSTLQSEEWFRVRLDR 


295 


Query: 


307 


REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 


366 






EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 




Sbjct: 


296 


LSEAAKVN-TDAMRSAQEEI -TEYRRQLQARTT ELEALKSTKESLERQRSELED 


347 


Query: 


367 


PLSEE-TSAEEAHAEVQSPLAEETTAEEASA — EIQLLAAIEAPAD-ETPAEAQSPLSEE 


422 




+ S ++A ++ + L TEA+ EQL++ D E A + EE 




Sbjct: 


348 


RHQVDMASYQDAIQQLDNEL-RKTKWEMAAQLREYQDLLNVKMALDIEIAAYRKLLEGEE 


406 


Query: 


423 


TSAEEAPAEV QS PS - AKGVS I E- EAPLELQPPSGEETT- AEEAS AAIQLLA-A 


471 




P+ + PS + + ++ e +++ S +ET EE + IQ+ 




Sbjct: 


407 


CRIGFGPSPFSLTEGLPKIPSMSTHIKVKSEEKIKWEKSEKETVIVEEQTEEIQVTEEV 


466 


Query: 


472 


TEASAEEAPAEVQPPPAEEAPAEVQP— PPAEEAPA E VQ P PPAEEA- - PAEVQP P PA 


524 




TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 




Sbjct: 


467 


TEEEDKEAQGE-EEEEAEEGGEEAATTS PPAEEAASPEKETKS P VKEEAKS PAEAKS PAE 


525 
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Query: 525 EEAPAEVQPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAE-ETPIE-ETLA 582 

++PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 
SbjCt: 526 AKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 584 

Query: 583 AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPPSEQTP-EDEALVENV 641 

V SP PES + PA++ E ++ AE PS ++P E ++ E 
Sbjct: 585 EVKSPATVKSPGEAKS PAEAKS PAEVKS PVE AKSPAEAKS PAS VKS PGEAKS PAEAK 641 

Query: 642 S-TEFQSPQVAGIP 654 

S E +SP P 
Sbjct: 642 S PAEVKS PATVKSP 655 

Score - 253 (38.0 bits), Expect = 9.0e-18, P - 9.0e-18 
Identities - 115/364 (31%), Positives = 166/364 (45%) 

Query: 110 EVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAE-ETHVQVQ- 167 

E PVV + A K + AE +PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 
Sbjct: 705 EAKSPVVAKSPAEAK- S PAEAKPPAEAKSPAEAKSPAEAKS PAEAKS- PAEAKS PVEVKS 762 

Query: 168 PSTEETPDAEAATAVAE — NSVKVQPPPAEEA — PL-VEFPAEIQPPSAEE--SPSVELL 220 

P ++P E A ++AE + K + P EE P V+ P + + P EE SP 
Sbjct: 763 PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSPMKEEAKSPEKAKT 822 

Query: 221 AEILPPSAEESPSEEP — PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE — 275 

++ P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
Sbjct: 823 LDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKE EAKSPEKEETRTEKVAPKKEEVK 879 

Query: 27 6 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+EE AK P VEE E P P+ +E ++ A + AEE P TE 

SbjCt: 880 SPVEEVKAKEPPKKVEE EKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE--ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

SbjCt: 937 SPGEAKKEEAKEK KAAA--PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 A-EEASAEIQLLAAIEAPADETPAEAQSPLSEETSAEEAPAEVQSPSA-KGVSIEEAPLE 448 

EE A + E E+ + P + + EE Q PS K E++ 

SbjCt: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 4 49 LQPPSGEETTAEEASAA 4 65 

Q S A E AA 

Sbjct: 1052 DQKDSQPSEKAPEDKAA 1068 

Pedant information for DKFZphtes3_17f 10, frame 3 



Report for DKFZphtes3_17f 10 .3 

[LENGTH] 710 

[MW] 75131.94 

[pi] 4.02 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 34.08 % 

SEQ MDRSQQTSRTGYWTMMNIPPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRD cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKIFASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPWQEGS 

SEG 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

SEG xxxxxxxxxxx 

PRD chhhhhhhccccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG xxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccccccchhhhhhhhhhc 

SEQ LPEBAPREEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADET 

SEG xxxxxxxxxxxxxxx . xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAEEAHAEVQSPIAEETTAEEASAEIQLLAAIEAPADETPAEAQSPLS 

5 EG xxxx .... xxxxxxxxxxxx xxxxxxxxxxxx xxxx 

PRO cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx xxxxxxxx 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDVPAEEASVD 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIESVFHIELKQRPPEL 

SEG 

PRD eeeehhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 

(No Prosite data available for DKFZphtes3_17f 10 .3) 
(No Pfam data available for DKFZphtes3_17f 10 . 3) 
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DKFZphtes3_17117 



group: metabolism 

DKFZphtes3_17117 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1) . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) . It is a new testis- 
specific transketolase. Transketolase requires thiamin pyrophosphate as cof actor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO (2) and R- 
CHOH-CO-CH(2)OH. 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 



strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 

Sequenced by GBF 

Locus: unknown 

Insert length: 2688 bp 

Poly A stretch at pos. 2649, polyadenylation signal at pos. 2630 



1 GACAAAAGAG AGATGATGGC 
51 GCAGGTGCTG CGGGACACAG 
101 CCACGTGTGC CTCTGGTTCT 
151 GAGGTCGTGT CTGTCCTCTT 
201 CCCAGAACAC CCGGACAACG 
251 CTCCTATCCT CTATGCTGCT 
301 GACTTGCTGA ACCTGAGGAA 
351 CCCGCGATTG CCGTTTGTTG 
401 TAGGTACTGC ATGTGGAATG 
4 51 AGCTACCGGG TGTTCTGCCT 
501 TGTGTGGGAG GCTTTTGCTT 
551 TGGCGGTCTT CGACGTGAAC 
601 GAGCATGGCG CAGACATCTA 
651 TACTTACTTA GTGGATGGCC 
701 GGCAAGCAAG TCAAGTGAAG 
751 TTCAAAGGTC GGGGTATTCC 
801 AAAGCCAGTG CCAAAAGAAA 
851 GTCAGATACA GACCAATGAG 
901 TCACCTCAAA TAAGCATCAC 
951 CAAAGTTGGT GACAAGATAG 
1001 CTAAACTGGG CCGTGCAAAT 
1051 ATGAACTCCA CCTTTTCTGA 
1101 CATAGAGTGT ATTATTGCTG 
1151 GTGCTACACG TGGTCGAACC 
1201 TTTACTAGAG CATTCGATCA 
1251 TATCAACCTT ATTGGTTCCC 
1301 TCTCCCAGAT GGCCCTGGAG 
1351 TGTACTGTTT TCTATCCAAG 
1401 TCTAGCCGCC AATACCAAGG 
1451 AAACTGCAGT TATTTATACC 
1501 AAGGTGGTCC GCCACGGTGT 
1551 AGTTACTCTC CATGAAGCCT 
1601 GTATTTCTGT CCGTGTCATC 
1651 GCCACCATCA TCTCCAGTGC 
1701 GGAGGATCAC TACAGGGAAG 
1751 TCTCCAGGGA GCCTGATATC 
1801 CCTCAACGTG GGAAAACTAG 
1851 CAGACACATT ATAGCAGCCG 
1901 TTATTTCTAA AAAGTCAAGT 
1951 CTTTGTATTA AATTCATGTT 
2001 ACAGTTGTAC TGTTTCTTTT 
2051 TCCTAATTTG GAAATTAAAG 
2101 TTACTCTGAG TTATTAATGT 
2151 AAATAAAACA ACTACCTAAT 
2201 TGACTGAGCT GGGGATTAAA 
2251 ATTTCCTTGT AAGTTAAAAA 
2301 CCAAGTTTTG AAGGATGTTT 
2351 AGTTTTACAG ATAATGTTTG 
2401 TTTGCCTTCA TCTCTCCTCT 
2451 ACATCTCTTG ATGCACCACA 



CAACGACGCC AAGCCCGACG TGAAGACCGT 
CCAACCGCCT GCGGATCCAT TCCATCAGGG 
GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 
CTTCCACACG ATGAAGTATA AACAGACAGA 
ACCGGTTCAT CCTCTCCAGG GGACATGCTG 
TGGGTGGAGG TGGGTGACAT CAGTGAATCT 
ACTTCACAGC GACTTGGAGA GACACCCTAC 
ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 
GCTTATACTG GCAAGTACCT TGACAAGGCC 
TATGGGAGAT GGCGAATCCT CAGAAGGCTC 
TTGCCTCCCA CTACAACTTG GACAATCTCG 
CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 
CCAGAATTGC TGTGAAGCCT TTGGATGGAA 
ATGATGTGGA GGCCTTGTGC CAAGCATTTT 
AACAAGCCTA CTGCTATAGT TGCCAAGACC 
AAATATTGAG GATGCAGAAA ATTGGCATGG 
GAGCAGATGC AATTGTCAAA TTAATTGAGA 
AATCTCATAC CAAAATCGCC TGTGGAAGAC 
AGATATAAAA ATGACCTCCC CACCTGCTTA 
CTACTCAGAA AACATATGGT TTGGCTCTGG 
GAAAGAGTTA TTGTTCTGAG TGGTGACACG 
GATATTCAGG AAAGAACACC CTGAGCGTTT 
AACAAAACAT GGTAAGTGTG GCACTAGGCT 
ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 
GCTCCGAATG GGAGCCATTT CTCAAGCCAA 
ACTGTGGGGT ATCCACTGGA GAAGATGGAG 
GATCTAGCCA TGTTCCGAAG CATTCCCAAT 
TGATGCCATC TCGACAGAGC ATGCTATTTA 
GAATGTGCTT CATTCGAACC AGCCAACCAG 
CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 
CAATGATAAA GTCACAGTAA TTGGAGCTGG 
TAGAAGCTGC TGACCATCTT TCTCAACAAG 
GACCCATTTA CCATTAAACC CCTGGATGCC 
AAAAGCCACA GGCGGCCGAG TTATCACAGT 
GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 
CTTGTTCATC AACTGGCAGT GTCAGGAGTG 
TGAATTGCTG GATATGTTTG GAATCAGTAC 
TAACACTTAC TTTAATGAAG TAAACTAGGC 
CTATTGGCTT TGGCCCAAAA GCACTGGTAT 
TATTGTCACA AAACCATTAT TTATACCTAT 
AAAGCAAAGC CATTTAACAT CTTTCTTCAT 
TTTACCTTTC TGTTAATCTA TGTATAAATG 
GGATTTTAAA ATTGTAAGCA ATAGAATAGG 
ACAAATATTT CTGATAAGAC TACAAATATC 
GTAGAGGTAA CTGTATCTTA AATGAGTATG 
AATTGAAATT TAATTGTAGA CTTCAATAGT 
GAGCTTTTGT ATAATGCCAT TTATACCTGC 
ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 
ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 
CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2501 TAACTGGTTC TAGTTTGCAC ACTACACACA TAGTTTTGTG AAGCTTCAGA 

2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 

2601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 

2651 GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

Yl mouse adrenocortical tumor cells. 
99123875: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase. 



Peptide information for frame 1 



ORF from 13 bp to 1B90 bp; peptide length: 626 
Category: strong similarity to known protein 
Classification: Metabolism 
Prosite motifs: ATP GTP A (595-603) 



1 MMANDAKPDV KTVQVLRDTA NRLRIHSIRA TCASGSGQLT SCCSAAEVVS 

51 VLFFHTMKYK QTDPEHPDND RFILSRGHAA PILYAAWVEV GDISESDLLN 

101 LRKLHSDLER HPTPRLPFVD VATGSLGQGL GTACGMAYTG KYLDKASYRV 

151 FCLMGDGESS EGSVWEAFAF ASHYNLDNLV AVFDVNRLGQ SGPAPLEHGA 

201 DIYQNCCEAF GWNTYLVDGH DVEALCQAFW QASQVKNKPT AIVAKTFKGR 

251 GIPNIEDAEN WHGKPVPKER ADAIVKLIES QIQTNENLIP KSPVEDSPQI 

301 SITDIKMTSP PAYKVGDKIA TQKTYGLALA KLGRANERVI VLSGDTMNST 

351 FSEIFRKEHP ERFIECIIAE QNMVSVALGC ATRGRTIAFA GAFAAFFTRA 

401 FDQLRMGAIS QANINLIGSH CGVSTGEDGV SQMALEDLAM FRSIPNCTVF 

4 51 YPSDAISTEH AIYLAANTKG MCFIRTSQPE TAVIYTPQEN FEIGQAKWR 

501 HGVNDKVTVI GAGVTLHEAL EAADHLSQQG ISVRVIDPFT IKPLDAATII 

551 SSAKATGGRV ITVEDHYREG GIGEAVCAAV SREPDILVHQ LAVSGVPQRG 

601 KTSELLDMFG ISTRHIIAAV TLTLMK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_171l7, frame 1 

SWISSPROT:TKT MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68)., N = 1, 
Score = 2222, ~P = 2.5e-230 

SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK) . , N = 1, Score « 
2202, P =■ 3.3e-228 

TREMBL:RN09256_1 product: "transketolase"; Rattus norvegicus 
Sprague-Dawley transketolase mRNA, complete cds., N - 1, Score = 2202, 
P = 3.3e-228 



SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK) . , N = 1, Score = 
2200, P = 5.3e-228 



>SWISSPROT:TKTJ40USE TRANSKETOLASE (EC 2.2.1,1) (TK) (P68) . 
Length - 623 

HSPs; 

Score - 2222 (333.4 bits), Expect = 2.5e-230, P = 2.5e-230 
Identities = 417/614 (67%), Positives = 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEH 66 
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Sbjct: 



6 



KPD + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 
KPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRN 



65 



Query: 67 PDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVDVATGSL 126 

P NDRF+LS+GHAAPI LYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 
SbjCt: 66 PHNDRFVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSL 125 

Query: 127 GQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFAFASHYNLDNLVAVFDVN 186 

GQGLG ACGMAYTGKY DKASYRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 
Sbjct: 126 GQGLGAACGMAYTGKYFDKASYRVYCMLGDGEVSEGSVWEAMAFAGIYKLDNLVAIFDIN 185 

Query: 187 RLGQSGPAPLEHGADIYQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPTAIVAKT 24 6 

RLGQS PAPL+H DIYQ CEAFGW+T +VDGH VE LC+AF QA K++PTAI+AKT 
SbjCt: 186 RLGQSDPAPLQHQVDIYQKRCEAFGWHTIIVDGHSVEELCKAFGQA KHQPTAIIAKT 242 

Query: 247 FKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQISITDIK 306 

FKGRGI IED E WHGKP+PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1+ 
Sbjct: 243 FKGRGITGIEDKEAWHGKPLPKNMAEQIIQEIYSQVQSKKKILATPPQEDAPSVDIANIR 302 

Query: 307 MTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHPERFIEC 366 

M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GDT NSTFSE+F+KEHP+RFIEC 
Sbjct: 303 MPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSELFKKEHPDRFIEC 362 

Query: 367 IIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSHCGVSTG 426 

IAEQNMVS+A+GCATR RT+ F FAAFFTRAFDQ+RM AIS++NINL GSHCGVS G 
Sbjct: 363 YIAEQNMVSIAVGCATRDRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVSIG 422 

Query: 427 EDGVSQMALEDLAMFRS I PNCTVFYPSDAISTEHAI YLAANTKGMC FI RTSQPETAVI YT 4 86 

EDG SQMALEDLAMFRS+P TVFYPSD ++TE A+ LAANTKG+CFIRTS+PE A+IY+ 
SbjCt: 423 EDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYS 4 82 

Query: 487 PQENFEIGQAKWRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFTIKPLDA 54 6 

E+F++GQAKVV +D+VTVIGAGVTLHEAL AA+ L + IS+RV+DPFTIKPLD 
Sbjct: 483 NNEDFQVGQAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDR 542 

Query: 547 ATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRGKTSELL 606 

1+ SA+AT GR++TVEDHY EGGIGEAV AAV EP + V +LAVS VP+ GK +ELL 
SbjCt: 543 KLILDSARATKGRI LTVEDHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELL 602 

Query: 607 DMFGISTRHIIAAV 620 

MFGI 1+ AV 
Sbjct: 603 KMFG I DKDAI VQAV 616 



Pedant information for DKFZphtes3_17117, frame 1 



Report for DKFZphtes3_17117 . 1 



[HOMOL] 
[FUNCAT] 
t FUNC AT ] 
[FUNCAT J 
( FUNCAT ) 
(FUNCAT) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-05 



[LENGTH] 
[MW] 



[pi] 



626 

67877.52 
5.90 

SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68). 0.0 

m outer membrane and cell wall [M. jannaschii, MJ0681] 3e-48 

g carbohydrate metabolism and transport [H. influenzae, HI1023] 9e-36 

01.05.01 carbohydrate utilization [S. cerevisiae, YPR074c] 5e-32 

30.03 organization of cytoplasm [S. cerevisiae, YPR074c] 5e-32 

02.07 pentose-phosphate pathway [S. cerevisiae, YPR074c] 5e-32 

01.01.01 amino-acid biosynthesis [S. cerevisiae, YPR074c] 5e-32 

i lipid metabolism [H. influenzae, HI1439] 3e-17 

c energy conversion [H. influenzae, HI1233] 2e-09 

02.01 glycolysis [S. cerevisiae, YBR221C PDB1 - pyruvate dehydrogenase] 



[FUNCAT] 
dehydrogenase] 



[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
(BLOCKS] 
( BLOCKS ] 
[ SCOP] 



(EC] 
[EC] 
[EC J 
[EC] 



30.16 mitochondrial organization [S. cerevisiae, YBR221c PDBl - pyruvate 

2e-05 

BL00801F 

BL00801E 

BL00801D Transketolase proteins 
BL00801C Transketolase proteins 
BL00801B Transketolase proteins 
BL00801A Transketolase proteins 

dltrka2 3.28.1.2.1 Transketolase Transketolase, C-terminal domai le-21 

1.2.4.1 pyruvate dehydrogenase (lipoamide) 8e-ll 

1.2.4.4 3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 

2.2.1.1 Transketolase 0.0 

2.2.1.3 Formaldehyde transketolase le-20 

transferase 0.0 

flavoprotein 2e-07 

Calvin cycle le-40 

heterotetramer 2e-07 
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f PIRKWl 


peiiuuoc pnuspnate patnway u-u 


[PIRKW] 


magnesium le~40 


[ PIRKW] 


thiamine Dvrooho^ohatA 0 Q 


[PIRKW) 


ox i doreduct a "7^ — 1? 


[PIRKW] 


fatty acid biosynthesis 4e — 10 


[ PIRKW] 


mi torhonriri nn ? f* —0~7 


[ PIRKW] 


peroxisome ls~20 


[ PIRKW] 


homod inter le~40 


[SUPFAM] 


pyruvate dehydrogenase (lipoamide) alpha chain le— 06 


l Owr r run j 




[SUPFAM] 


ferredoxin 2 [4Fe-4S] -related protein 8e-47 


[SUPFAM] 


thiamine pyrophosphate-binding domain homology 0.0 


[SUPFAM] 


pyruvate dehydrogenase (lipoamide) 6e-08 


[SUPFAM] 


ferredoxin 2[4Fe-4S] homology 8e-47 


[SUPFAM) 


hypothetical protein C2814 2e-21 


[SUPFAM] 


transketolase 0.0 


[PROSITE] 


ATP_GTP_A 1 


[PFAM] 


Transketolase 


[KW] 


Alpha Beta 


[KW] 


3D 


[KWJ 


L0W_C0MPLEXITY 3.04 % 



SEQ MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYK 

SEG 

IngsB HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 

SEQ QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVD 

SEG 

IngsB TTTTTTTTTCEEEETTGGGHHHHHHHHHHHCTTCHHHHHTTTTTTTTTTTTTTTTTTTTC 

SEQ VATGSLGQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFAFASHYNLDNLV 

SEG 

IngsB CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTTEE 

SEQ AV FD VN RLGQS G PAP L EHGA D I YQNCC E A FGWNT Y L V DGH DV E ALCQA FWQAS QVKN K PT 

SEG 

IngsB EEEEECCEETTEEGGGCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

SEQ AIVAKTFKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQI 

SEG 

IngsB EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHH 

SEQ SITDIKMTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHP 

SEG 

IngsB HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHHHHTTTTTEEEEETTTHHHHCCTTCEECCG 

SEQ ERFIECIIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSH 

SEG xxxxxxxxxxxxxxxxxxx 

IngsB GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGHHHHHHHHHHCTTTEEEEEC 

SEQ CGVSTGEDGVSQMALEDLAMFRSI PNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPE 

SEG 

IngsB CCGGGTTTTTTTTCCHHHHHHHCTTTTEEECCCCHHHHHHHHHHHTTTTCEEEECCCCCB 

SEQ TAVIYTPQENFEIGQAKWRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFT 

SEG 

IngsB CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE 

SEQ IKPLDAATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRG 

SEG 

IngsB 

SEQ KTSELLDMFGI STRHI I AAVTLTLMK 

SEG 

IngsB 



Prosite for DKFZphtes3_17117 . 1 
PS00017 595->603 ATP GTP A PDOC00017 



Pfam for DKF2phtes3_17117 . 1 



HMM_NAME Transketolase 

HMM *vNtIRiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 
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+N++RI ++ A + +SG ++++++A++ VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEHPD 68 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ W + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 117 

HMM PGVEVTTGPLGQGIaNaVWMAIAERnLAATYNRPGFDI f DHYTYCFMGDG 
++ +V+TG+LGQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV-DVATGSLGQGLG T ACGMA YTG K Y L D KAS Y R V FC LMGDG 157 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSIDGdTdlWFqEDtYakRF 
+ +EG++WEA ++A+H++L+N++A +D NR++++G++++ + D+Y+ + 
Query 158 ESSEGSVWEAFAFASHYNLDNLVAVFDVNRLGQSGPAPLEHGADI YQNCC 207 

HMM EAYGWHVIEVEnDGHDvEelcaAIEeAKaekDRPTLIiCRTVIGYGSPNk 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T++G+G+PN 
Query 208 EAFGWNTYLV — DGHDVEALCQAFWQASQVKNKPTAIVAKTFKGRGIPNI 255 

HMM QGTHdWHGAPLGeD* 

++ + WHG+P +++ 
Query 256 EDAENWHGKPVPKE 269 

HMM * PqWePnddklATRKASQqaLeaiGPaLPEf WGGSADLTPSNLTrWKGmv 

P++++ +DKIAT K+++ AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKE 358 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGIAlHGgNFRPYGGT 

+ + R+I++ I+E++M++++ G+A++G+ ++++ G 

Query 359 H ' PERFI ECI I AEQNMVS VALGCATRGR-TI AFAGA 392 

HMM FMMFyDYARPAIRMAALMelPVIWVWTHDSIGLGEDGPTHQPVEHLAHFR 
F++F+++A++++RM A++ +++++++H++++ GEDG +++++E+LA+FR 
Query 393 FAA F FT RA FDQLRMG AI SQAN INLIGSHCGVSTGE DGV S QMALE DL AMF R 442 

HMM alPNMsVWRPCDgNETayAWylAvERehTPtiLILSRQNLPQlErNPrqf 

+IPN +V++P+D+ T+ A YLA+++++ +++++S ++ +++++ P + 
Query 443 SIPNCTVFYPSDAISTEHAIYLAANTKGM-CFIRTSQPETAVIYT-PQEN 490 

HMM ekvaRGGYVLkDmdnePDVILIATGSEMELAvaAAKlLadEGIkaRVVSM 
++++++++V + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 
Query 491 FEIGQAKVVRHGVN--DKVTVIGAGVTLHEALEAADHLSQQGISVRVIDP 538 

HMM PCTeWFD kQDeEYReSVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + ++++R +++DH++ +++++++V ++ +++ + 

Query 539 FTIKPLDAATIISSAKATGGRVITVEDHYR-EGGIGEAVCAAVSREPDIL 587 

HMM Galf GMNrFGESSGKAPpevLYkMFGFTPENI* 

+ +++ +++ ++ +L+ MFG+ +1 

Query 588 VHQLAVSGVPQR GKTSELLDMFGI STRHI 616 
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DKFZphtes3_17nl2 



group: transcription factors 

DKFZphtes3_17nl2.1 encodes a novel 804 amino acid protein which is nearly identical to mouse 
and trout SOX-LZ. 

Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context. 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper . 

The new protein can find application in modulating/blocking the expression of SOX-controlled 
genes . 



nearly identical to mouse SOX-LZ 

complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ, involved in spermatogenesis 

Sequenced by GBF 

Locus : unknown 

Insert length: 2802 bp 

Poly A stretch at pos. 2692, polyadenylation signal at pos. 2660 



1 GGGATAGGAA AGATGAAAGG 
51 GTTGTCTCAT GTAACAATAG 
101 CAGCCCCTAA GTCAGGTGAT 
151 AACAGGAGTG GGTGGAGGAA 
201 CAAGCAAGCC ACCTCTCCAT 
251 TGACCCAGGA TTTAACCTCA 
301 GTGGCCTCCC ATCTGCCTCT 
351 TGAGGAGCTA CCAACACTTG 
401 ACAGCGTTCT GTCATCTCAG 
451 TGTTCCCTAT ATTCCTTCCG 
501 CGAAGGGAGT CGGGACCGTG 
551 CAGAGCGCCG CAAAGGGAGT 
601 AAGAAGCTTG AGGAAATGAC 
651 GGAAAAACTA CTTTCAAAAG 
701 CCAGTGAACT TCTTGGAGAA 
751 AAAGAACGGC AGCTCTCCAC 
801 GCAGCTACTG GCAGCGCATG 
851 TTGAGAAACA ACGGCAGCAA 
901 ATTGCGAGAC AACAGCAGCA 
951 CCTGCAGCAA CAGATCCAGG 
1001 CAATTTTTCC ACATGACCAG 
1051 CAGGGATTCC TCTTCCCCCC 
1101 CCCCGTACAG TTCATTCCAT 
1151 TCAGCCCTTT ACAGCTCCAG 
1201 CAGGTGTCAC CTGGAGCAAA 
1251 AGCAGGGACG GTCTCACCTA 
1301 GCCCTGTAAC TCAAGTTAAG 
1351 TCATCCCGAC CCAAGACAGC 
1401 CCAGAACCTC TTCCCAGCCA 
1451 AAAGCAGCAT CCCTAGCCCC 
1501 TTAGGTAAAT GGAAAAGTCA 
1551 CCTATCTAGT CTCAACTCCC 
1601 TGAAAGCCAT TCAGGAGGCG 
1651 CAACAGCAGC AACAGCCACA 
1701 TAATATGGGG CTGAACAGCT 
1751 AGAATTTGGG GCCCCAGTTA 
1801 GGCCCAGGTG TCATCGACCT 
1851 AGCAATGAAT GGCTCTGCAG 
1901 CAGGAGGTGC CACTGTGGCT 
1951 CGTGCCAGCA GCGAGCCACA 
2001 TTGGGCAAAG GATGAGAGGA 
2051 ATAACTCCAA CATTAGCAAA 
2101 AACCAGGAGA AGCAACCTTA 
2151 CCACTTAGAG AAGTACCCAA 
2201 CCTGCATTGT TGATGGCAAA 
2251 ATGAGGTCTC GGAGACAGGA 
2301 GCCTCAGATT CCAATCACCA 
2351 TCACTATGGC AACTACCACA 



TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
GGAACAAATA CCTACAGTTT AGTCAGGTGA 
GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
GCACCCCATA ATGCACAACA AACCTCACTC 
TCAGTACCAT TCAACAAGAT GCTGACTGGG 
CAAAGAATGG AATCAGAGAA TAATAAGTTA 
AAATACCTCT ACCTCACCAC ATAAGCCTGA 
AGATAATGAC CAGTGTTACT TTTGGAACCC 
CTTGCCGATG TGGTGGACAC ACTGAAACAG 
TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
ATTGGAAGGA AAAAATGGAA AGACTAAATA 
ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
CATGATTACC CAGCTGATCA GTTTACGGGA 
ATGAACAGAA AAAACTGGCA GCGTCACAAA 
ATGGACCTTG CTCGCCAACA GCAAGAACAG 
ACTTCTGCAA CAGCAGCACA AAATTAATCT 
TTCAGGGTCA CATGCCTCCG CTCATGATCC 
CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
TGGAATAACA TACAAACCAG GTGATAACTA 
CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
GATGCCATCA ACTCCACAGC CACCAAACAC 
CTGGGATAAA AAATGAAAAG AGAGGGACCA 
GATGAAGCAG CAGCACAGCC TCTGAATCTC 
AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
GCAAAACCAG CCCTGTCAAT CTGCCAAACA 
ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
ACACCAGGAA GAGACTTACG AATTAGATAT 
CTGCCCTTTT TGGGGATCAG GATACAGTGA 
CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
CTAAACTACA GCAGTATTAT TGTTGGCCAA 
GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
CATTAAGCGA CCAATGAATG CATTCATGGT 
GAAAAATCCT TCAGGCCTTC CCCGACATGC 
ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
ACTATAAATA CAAACCCCGA CCGAAACGCA 
AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
GATGAGGCAG TTCTTTACTG TGGGGCAACA 
CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
CCATCGCCTC AGATGACATC TGACTGCTCT 
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2401 AGCACCTCGG CCAGCCCGGA 
24 51 TGGTATGAAG ACAGATGGCG 
2501 GAGAGGATGA AATGGAAATG 
2551 GACTATAGCA GTGAAAATGA 
2 601 AGTTTTTGTT TGCTGAATTA 
2651 ACAAAGAGTT ATTAAAGAGC 
2701 AAAAAAAAAA AAAAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA 
2801 AA 



GCCCAGCCTC CCGGTCATCC AGAGCACTTA 
GAAGCCTAGC TGGAAATGAA ATGATCAATG 
TATGATGACT ATGAAGATGA CCCCAAATCA 
AGCCCCGGAG GCTGTCAGTG CCAACTGAGG 
AAGTACTCTG ACATTTCACC CCCCTCCCCA 
CCGCATGCAT TTGTGGCTCC ACAATTAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



95311974: 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine zipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 



Peptide information for frame 1 



ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 



1 MGRMSSKQAT SPFACAADGE DAMTQDLTSR EKEEGSDQHV ASHLPLHPIM 
51 HNKPHSEELP TLVSTIQQDA DWDSVLSSQQ RMESENNKLC SLYSFRNTST 
101 SPHKPDEGSR DREIMTSVTF GTPERRKGSL ADVVDTLKQK KLEEMTRTEQ 
151 EDSSCMEKLL SKDWKEKMER LNTSELLGEI KGTPESLAEK ERQLSTMITQ 
201 LISLREQLLA AHDEQKKLAA SQIEKQRQQM DLARQQQEQI ARQQQQLLQQ 
251 QHKINLLQQQ IQVQGHMPPL MIPIFPHDQR TLAAAAAAQQ GFLFPPGITY 
301 KPGDNYPVQF IPSTMAAAAA SGLSPLQLQQ LYAAQLASMQ VSPGAKMPST 
351 PQPPNTAGTV SPTGIKNEKR GTSPVTQVKD EAAAQPLNLS SRPKTAEPVK 
401 SPTSPTQNLF PASKTSPVNL PNKSSIPSPI GGSLGRGSSL GKWKSQHQEE 
4 51 TYELDILSSL NSPALFGDQD TVMKAIQEAR KMREQIQREQ QQQQPHGVDG 
501 KLSSINNMGL NSCRNEKERT RFENLGPQLT GKSNEDGKLG PGVIDLTRPE 
551 DAEGSKAMNG SAAKLQQYYC WPTGGATVAE ARVYRDARGR ASSEPHIKRP 
601 MNAFMVWAKD ERRKILQAFP DMHNSNISKI LGSRWKSMSN QEKQPYYEEQ 
651 ARLSKIHLEK YPNYKYKPRP KRTCIVDGKK LRIGEYKQLM RSRRQEMRQF 
701 FTVGQQPQIP ITTGTGVVYP GAITMATTTP SPOMTSDCSS TSASPEPSLP 
751 VIQSTYGMKT DGGSLAGNEM INGEDEMEMY DDYEDDPKSD YSSENEAPEA 
801 VSAN 



BLAST P hits 



Entry MMSOXLZ2_l from database TREMBL : 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 

Score - 3910, P = 0.0e+00, identities « 764/801, positives « 774/801 

Entry 151083 from database PIR: 
SOX-LZ - rainbow trout 

Score = 1774, P = l.le-287, identities = 365/532, positives » 431/532 

Entry S59121 from database PIR: 
SOX6 protein - mouse 

Score - 2319, P » 1.2e-240, identities = 489/660, positives - 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSoxSL"; product: "SOX5"; Mus musculus mSoxSL mRNA, complete 
cds . 

Score - 1212, P « B.9e-209, identities = 274/457, positives = 324/457 
Entry MMU010604_1 from database TREMBL: 

gene: "sox5"; product: "L-Sox5 protein"; Mus musculus mRNA for 
transcription factor L-Sox5 

Score = 879, P - 4.2e-195, identities » 190/281, positives - 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
NO Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2, frame 1 

Report for DKFZphtes3_17nl2 . 1 



[LENGTH] 
[MW] 
[pi} 
[HOMOLJ 

[FUNCAT) 

[FUNCAT] 

[FUNCAT] 

cerevisiae, 

[FUNCAT] 

7e-06 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[SCOP] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE) 

[PROSITE] 

[PROSITE) 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[KW] 

[KW] 



804 

89332.69 
6.97 

TREMBL:MMSOXLZ2 1 product: "SOX-LZ M 



Mouse mRNA for SOX-LZ, complete cds. 0.0 



04.05.01.04 transcriptional control [S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization [S. cerevisiae, YKL032c] 8e-07 
01.07.07 regulation of vitamins, cof actors, and prosthetic groups [S. 
YPR065w] 5e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR089c-a] 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a] 7e-06 

03.01 cell growth [S. cerevisiae, YBR089c-a] 7e-06 

03.16 dna synthesis and replication [S. cerevisiae, YMR072w] 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072w] 2e-04 

dlhmf 1.20.1.1.1 HMG1, fragments A and B [rat/hamster (Rattu le-13 

dllefa_ 1.20.1.1.6 Lymphoid enhancer-binding factor, LEFl [mous 4e-15 

dlhrya_ 1.20.1.1.4 SRY [Human {Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

ATP_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

AMIDATION 1 

CAMP_PHOSPHO_SITE 2 

CK2_PH0SPH0_SITE 14 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 



SEQ MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 

SEG 

COILS 

lnhm- 

SEQ TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 

SEG 

COILS 

lnhm- 

SEQ GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 

SEG 

COILS 

lnhm- 

SEQ KGTPESLAEKERQLSTMITQLISLREQLLAAHDEQKKLAASQIEKQRQQMDLARQQQEQI 

SEG xxxxxxxxxxxxxxx 

COILS CCCCCC 

lnhm- 

SEQ ARQQQQLLQQQHKINLLQQQIQVQGHMPPLMIPIFPHDQRTLAAAAAAQQGFLFPPGITY 

SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCC 

lnhm- 

SEQ KPGDNYPVQFIPSTMAAAAASGLSPLQLQQLYAAQLASMQVSPGAKMPSTPQPPNTAGTV 

SEG xxxxxxxxxxxx 
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COILS 

Inhm- 

SEQ SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

Inhm- 

SEQ PNKSSI PSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . . xxxxxxxxxxxxxxxxxx 

COILS 

Inhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG . .xxxxxxxxxxxx 

COILS 

Inhm- 

SEQ PGVIDLTRPEDAEGSKAMNGSAAKLQQYYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

COILS 

Inhm- CCC 

SEQ MNAFMVWAKDERRKILQAFPDMHNSNISKILGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

COILS 

Inhm- CCCHHHHHHHHHHHHHHHTTTTCCHHHHHHHHHHHTTTTTTHHHHHHHHHHHHHHHHHHH 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQIPITTGTGVVYP 

SEG xxxxxxxxxxxx 

COILS 

Inhm- HHHTTTTTTT 

SEQ GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQSTYGMKTDGGSLAGNEMINGEDEMEMY 

SEG xxxxxxx 

COILS 

Inhm- 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

Inhm- 



Prosite for DKFZphtes3_17nl2 . 1 



PS00001 


97->101 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


172- 


->176 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


388- 


->392 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


422- 


■>426 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


559- 


>563 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


626- 


->630 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


126- 


■>130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


369- 


■>373 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




5->8 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


28->31 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


94->97 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


136- 


■>139 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


203- 


■>206 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


299->302 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


390- 


•>393 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


512- 


■>515 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


530->533 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


692- 


->695 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


28->32 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


129->133 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


146- 


■>150 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


148- 


•>152 


CK2 PHOSPHO SITE 


PDOCG0006 


PS00006 


154- 


■>158 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


186- 


■>190 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


203- 


■>207 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


221- 


>225 


CK2~PHOSPHO"SITE 


PDOC00006 


PS00006 


520- 


•>524 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


533- 


■>537 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


547- 


■>551 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


577- 


■>581 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


639- 


•>643 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


793->797 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


182- 


■>188 


MYRISTYL 


PDOC00008 


PS00008 


431- 


■>437 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00029 



437->443 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

ATP_GTP_A 

LEUCINE ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PDOC00029 



Pfam for DKFZphtes3_17nl2 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 



HMG (high mobility group) box 

♦PKRPMNAYMLWMQEMReklKaENPNdMhNtEISKMiGEMWKnMsEEEKm 
+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK+ 
597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSNISKILGSRWKSMSNQEKQ 64 4 



PYEdMAeeEKqRYMKEMPeYK* 
PY+++ +++ + +++ +P+YK 
645 PYYEEQARLSKIHLEKYPNYK 



665 
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DKFZphtes3_17nl8 



group: intracellular transport and trafficking 

DKFZphtes3_17nlB encodes a novel 782 amino acid protein with weak partial similarity to known 
proteins . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 
receptor protein signature 1. In E. coli, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell. 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 



unknown receptor 

protein containes TONB_DEPENDENT_REC_l Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length; 2853 bp 

Poly A stretch at pos. 2806, no polyadenylation signal found 



1 GTCCTTTTAA GTCAGTAAAT 
51 CCTATAAAAA ACTACATGGC 
101 TCTTTCACCC TCGGATCTCT 
151 CACTACCTCC ACCACCACCA 
201 ACTGGGGCAG CCAAGCGCTC 
251 GCGCACCCAC CAGGAGACCC 
301 TGCTGACGGA GCTCCTCAGA 
351 TCGGTGGGTG CCAACCCCTT 
401 CCAGCTCCTC CACCTCAATG 
451 GCACAGCCGG GAGAAGTGGC 
501 CTCGCAAACA TGTCCGCCAT 
551 CCACTCTTCC ACAGCCTGTC 
601 CCAAGAAGAA AATAGGCAAA 
651 CCCCTGCATC GAGGAGTGGG 
'701 CCCCTGCCCT GAGGCCCGGG 
751 AAGCTGAAAG GGCCACATGG 
801 TTACGAAACT ACAAGGCAAA 
851 AGGAGACTCT CAGACCCCGG 
901 AGACTCTCAG CCCCACCTCT 
951 CAGCATTGTC AAGAGGGGAA 
1001 CACCTTCTAT GATGGCTCCT 
1051 CTGTATGTCA GATCCCCACA 
1101 TTTAATGACA TACCTGGATT 
1151 CCAGGGCTGT GTTCACTACA 
1201 TCTTGGATGA GGAAGGTGGG 
1251 CACAAGTGGA GCTGGACTTC 
1301 CAAGGTGAAT GAGGAAATGA 
1351 CAGTCACCTT CACCTCCCTG 
1401 AACAATTGTC CCCATGGAAT 
14 51 CAGCAACATG GACGACAAGG 
1501 TCAAGAAGCG GTTTCAGAAG 
1551 CTGGCCGCAG GTCTGTTTAC 
1601 AGAATTTGTT CGGTTCAAGA 
1651 CCAAGCTAAG TTTATACTCA 
1701 CACCTGGAAT CCTCAATTGC 
1751 TCCTGTGAGC CCAGTTCGGA 
1801 TCACATCCAG AGGGAAGGCC 
1851 GCCTTGCCCT CAGACTGCCC 
1901 AGACACCCGT GCTGGCTGCA 
1951 ACGTGGAGCT GGAGCGCTTC 
2001 CTGGTGTTTG GGATCATCTC 
2051 CCAGTGGCTG CTGAACACTC 
2101 CCCCCTGCAT CCAGTGCCGG 
2151 CTGGACAGCC CCCTGCAGGA 
2201 TGTGGTGCAG GGGATGATTC 
2251 GGGGCCGTGT TTTGAATGGA 
2301 CAGATCTTCC GGTCTCAACA 
2351 TGACTACAAA TTCAGTGTTC 
2401 AATCAGTCAA GAAAGCCGAG 



TGAACTAAGT CGGTTATTCG GCAAGCAGTT 
TAAGGTTCTT AATGATTGAC CACAAGCAGA 
AGCTACAAAA GGTCCCCACA CTGAAGAAGC 
GCACCACCAC GTCCAGTGCT GCTGGCAACC 
CACCCTCTCT CCCACCATGG CCCGTCAGGT 
TGAACAGGTT TCAGCAGCAG TCCATCCACC 
CTGAAGATGA AGGCCATGGT GGAGTCTATG 
GGACATCACC AGGCGCTTTG TGGAGGCCAG 
CCAAGGAGAT GGCCTTCAAC TGCCTGATCA 
TACAGCAGCG GACAGTTGTG GAAAGAGTCC 
TGGGGTGAAC TCGCCTTACC AGCTGATCTA 
TGAGCTTTTC TCTCTCTGCT GGAAAAGAAG 
TCTAGAACTA CAGAAGATGT CAGCATGCCG 
AACCCCTGCC AACAGCCTGG AGTTCAGCGA 
AGAAGCTGCA GGAGTTGTGT CGCCACATAG 
AAAGGGAGGA ATATCTCCTA CCCCATGATC 
GATGCCCTCT CATCTAATGT TGGCCCGCAA 
GTTTACATTA CCCTCCCACT GCAGGTGCTC 
CACCCATCTT CTGCCAACCA TCATTTCAGT 
GGCACCCAAG AAGGCCTTCA AGTTTCATTA 
CCTTCGTTTA CTATCCCTCT GGAAACGTCG 
TGCTGCAGAG GGAGAACCAT CACCTGCCTC 
CTCCTTGCTG GCCCTATTCA ATACTGAAGG 
ACCTAAAAAC CAGTTGCCCA TATGTCTTAA 
ACCACCAATG ACCAGCAGGG CTATGTAGTC 
C AG G AC AG AG ACCCTGCTTT CCCTGGAATA 
AACTAAAGGT ACTGGGACAG GACTCCATCA 
AATGAGACAG TAACACTCAC TGTGTCGGCC 
GGCATATGAC AAACGGCTGA ACCGCAGAAT 
TGTATAAGAT GAGCCGAGCC CTGGCTGAGA 
ACAGTGACTC AGTTCATTAA TTCTATCTTG 
CATTGAATAT CCCACCAAAA AGGAGGAGGA 
TGAGATCCAG AACTCATCCC GAGCGGCTCC 
GGAGAAAGTC TTTTACGATC TCAGTCAGGC 
AGAGACTTTG AAGGATGAGC CTGAGTCTGC 
AGACCACCAA AATCCACACC AAAGCCAAGG 
CGCGAGGGGC GCAGCCCCAC CAGGTGGGCG 
GCTGGTGCTG CGGAAGCTCA TGCTCAAGGA 
AGTGCCTGGT GAAGGCGCCC CTGGTCTCTG 
CTGTTGGCGC CCCGAGACCC CAGCCAAGTG 
AAGCCAGAAC TACACCAGCA CTGGGCAGCT 
TCTACAACCA CCAGCAGCGG GGCCGTGGCT 
TATGACTCCT ACCGCCTGCT GCAGTATGAC 
GGACCCTCCC CTGATGGTGA AGAAGAACTC 
TGATGTTTGC CGGGGGGAAG CTCATTTTTG 
TATGGCCTCA GCAAGCAGAA TCTGCTGAAA 
GGATTACAAG ATGGGCTACT TCCTGCCGGA 
CCAACTCTGT CCTGAGCCTG GAGGATTCTG 
TCAGAAGATA TCCAAGGAAG CAGCTCCTCA 
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2451 TTGGCCCTGG AAGACTATGT GG7\t?AAGGAG TTATCTCTGG AGGCTGAGAA 

2501 GACAAGAGAG CCTGAAGTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 

2551 TAACTAGTTG GAAGAAGCAG GCCTcJCAAGA AGTAGCGCCA TCCTGGCAGC 

2601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 

2651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGr CCTCCAAGCT TCTATAATAA 

2701 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 

2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 

2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 

2851 CCG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 237 bp to 2582 bp; peptide length: 782 

Category: putative protein 

Prosite motifs: ATP_GTP_A (122-130) 

TONB DEPENDENT REC 1 (1-44) 



1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 
51 FVEASQLLHL NAKEMAFNCL ISTAGRSGYS SGQLWKESLA NMSAIGVNSP 
101 YQLIYHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 
151 LEFSDPCPEA REKLQELCRH IEAERATWKG RNISYPMILR NYKAKMPSHL 
201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 
251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 
301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT NDQQGYVVHK WSWTSRTETL 
351 LSLEYKVNEE MKLKVLGQDS ITVTFTSLNE TVTLTVSANN CPHGMAYDKR 
401 LNRRISNMDD KVYKMSRALA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 
4 51 KKEEEEFVRF KMRSRTHPER LPKLSLYSGE SLLRSQSGHL ESSIAETLKD 
501 EPESAPVSPV RKTTKIHTKA KVTSRGKARE GRSPTRWAAL PSDCPLVLRK 
551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGIISSQNYT 
601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 
651 VKKNSWQGM ILMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 
701 YFLPDDYKFS VPNSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 
751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17nl8, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl8, frame 3 



Report for DKFZphtes3_17nl8 . 3 



(LENGTH] 782 

[MW] 88030.16 

tpl] 9.22 

(BLOCKS] BL00286 Squash family of serine protease inhibitors proteins 

(PROSITE] ATP_GTP_A 1 

(PROSITE] MYRISTYL 4 

[PROSITE] CAMP_PHOSPHO SITE 

[PROSITE] CK2 PHOSPHO SITE 

[PROSITE] PROKAR_LIP0PROTEIN 

[PROSITE] TON B_ DE P EN DENT_REC_1 

[ PROS ITE ] PKC_PHOS PHO_S I TE 

[PROSITE] ASN_GLYCOSYLATION 

[KWJ Alpha_Beta 



629 



WO 01/12659 



PCT/IB00/01496 



SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLIYHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPANSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RNISYPMILRNYKAKMPSHLMLARKGDSQTPGLHYPPTAGAQTLSPTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQI PTCCRGRTITCLFNDIPGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FNTEGQGCVHYNLKTSCPYVLILDEEGGTTNDQQGYVVHKWSWTSRTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANNCPHGMAYDKRLNRRI SNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeecccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRSPTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKCLVKAPLVSDVELERFLLAPRDPSQVLVFGI ISSQN YT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYNHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKNSWQGM 

PRD ccchhhhhhhhhhhhhcccccccceeeecccccceeecccccccccccccccccchhhhh 

SEQ ILMFAGGKLIFGGRVLNGYGLSKQNLLKQI FRSQQDYKMGYFLPDDYKFSVPNSVLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD cc 



Prosite for DKFZphtes3_17nl8.3 



PS00O01 
PS00001 
PS00001 
PS00001 
PS00OO4 
PS00004 
PS00004 
PS00005 
PS00OO5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSO0005 
PS00005 
PS00005 
PS00006 
PS00O06 
PS00O06 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00O06 
PS00006 
PSO0O06 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 



182->186 
379->383 
598->602 
403->407 
511->515 
652->656 



177->180 
344->347 
450->453 
497->500 
513->516 
523->526 
631->634 
723->726 
774->777 



131->135 
256->260 
329->333 
345->349 
377->381 
406->410 
450->454 
466->470 
493->497 
497->501 
571->575 
693->697 
717->721 
145->151 
327->333 
592->598 
734->740 



48->51 



91->95 



7->ll 



asn_glycosylation 
asn_glycosylation 
asn_gl ycos ylat i on 
asn_gl ycos y lat i on 
camp_phos pho_s ite 
camp_phospho_site 
camp_phospho_site 
pkc phosphors ite 
pkc~phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phos pho_s i te 
pkc_phospho_site 
pkc_phos pho_s i te 
pkc phospho_site 
pkc~phospho_site 
pkc phospho_site 
ck22phospho_site 
ck2_phospho site 
ck2_phospho~site 

CK2 PHOSPHO_SITE 
CK2~PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
CK2_PHOSPHO~SITE 
CK2 PHOSPHO_SITE 
CK2~PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
MYRISTYL 



MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
.PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC0000B 
PDOC00008 
PDOC00008 
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PS00013 101->112 PROKAR_LIPOPROTEIN PDOC00013 
PS00017 122->130 ATP_GTP_A PDOC00017 
PS004 30 l->44 TONB DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphtes3_17nl8. 3) 
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group: testes derived 

DKFZphtes3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1. 

The novel protein contains two leucine zippers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



similarity to TNF-inducible protein CG12-1 



Sequenced by MediGenomix 



Locus : unknown 



Insert length: 4608 bp 

Poly A stretch at pos. 4570, polyadenylation signal at pos. 4550 



1 GACAGAAGTG AATGGGAATG 
51 CCCGACGCGC TGCGGCGCTT 
101 GCTGCACCGC CAGGTGCTGC 
151 GCCTGCGCAG GCGCTCCCTC 
201 GCAACGGGCG CCCTCGCCGC 
251 CCTGGGGACC TCGCTGCTGG 
301 CCGGAGGGGC CGTCACCATC 
351 TCCCGGGAGC TGCGGAGGGT 
401 GATGCGAGAG ATCCTGAGCT 
4 51 GCGGGGACCG CCAGCTGCTG 
501 TACAATTCTG TCTACTTCAT 
551 CCCCAGGCGG GCGGAGGGGG 
601 CCAAGATTCA GAAACTGGCC 
651 GACGAACTCA GCGAGCAGCT 
701 CAGTCGTGGC CACGACCTCA 
751 TTTTCTGAGA ACATCCTTTC 
801 TCATGGGATG CTCCAGAATT 
851 GTTAGGAGCC GAAGGCAAAG 
901 GTCCCCAAAG CCCTTCTTTT 
951 GTGCTACGGA CTTTTCAGTC 
1001 CCTTTTCCTT TATCAAAAAC 
1051 CCTGTTTTAA AGTTATTTCG 
1101 CTGAGACTGG AGAGAGTGCC 
1151 TCTTTGAAGT CCTCAAAATG 
1201 GAATTTCTGA TACAAAGAAC 
1251 TGAACGTTGT AGGATGGTTC 
1301 CTCCTCCTCT TCCCTTTCCC 
1351 GGTATGAAGT AGGCCTTTCC 
1401 CTCAGAGGGG AAGCCCGTGA 
1451 GAAGCTGAGG GCCAGATAGA 
1501 TCATTGTGTT CAGAAGAGAG 
1551 TTGAGACAGG CTTTGTCACT 
1601 GAGACTGTGT AAAACAAAGA 
1651 AAATGAGCCA AAGTGCCTAG 
1701 CCTGGAATGG CAAATTAACA 
1751 CTTGGACACA ACCCTCCTGA 
1801 CCCAGCAACC TGAGTTTAAA 
1851 AAGGTTTGAA AATCACTACA 
1901 GCTTTCACGA AGTCTCACGG 
1951 CGTTCACAGA TGACCAAGGA 
2001 AGAGAGAGAG AGCACGCGTA 
2051 GAATAAGGGA TGTAACACTA 
2101 TGTTGTAGAA ACTGGTACAG 
2151 ACCTTTGTCT ATTATTACCT 
2201 TATTTATTTT AAGTTTGTAA 
2251 ATTTAATTTT ATTTAATCAC 
2301 TAAATGGGAG ACTGCTGAGG 
2351 TTTCTAAGGC AGGGCATGAG 
2401 TCTCTCCTTC AGGGAGATTC 
2451 AAGTAAGTGT GGAGAGTCTT 
2501 TTGTAATGGA AGCTTGCATT 
2551 TCCTGAAGGT ATTTTGCCAG 
2601 ATTTAGTTCA GGAAAGATAA 
2651 AGAACTTGCA AGCCTGATGT 
2701 CAAAATAGCT TGTCTTATGG 



GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 
CCAGGGACTG CTGCTGGACC GCCGAGGCCG 
GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 
GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 
CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 
TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 
ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 
GCAGGAGATC GCGGCCACCT GCCAGGACCA 
GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 
CAGTGCGGGA GGAACGCCTC CATCGCCCTG 
CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 
ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 
GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 
GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 
AGATCTCTGC TGACCAGCGT GCAGGGCTGT 
CCCCTAATGA CCGAGGCCAG CAAATCATCC 
TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 
GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 
CCCATCACTG TGACATCTGC CTGGGCTTGA 
TTCCTAGTGG AAAAATGTGA CCCAAAAACT 
TTTCTGTCTA AACACAGCTG GGCAGGCACT 
GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 
ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 
TGCGTGAGGA AGGCATTTGC CTCTATTCCA 
TCCAGAATCC AGAGCAAATC AGCCCTTCTC 
AGAACCCAGA GAGGACCCTG GTGCTGATAT 
CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 
TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 
AGCCGTCATC TAAGTCCTGC TCCCTCACAT 
TGGAGCGACT GCCAACTTCA TTTCCCGACA 
TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 
ACTGTGTGAG TGTAGCCACC TAATCTCTCT 
TGATAAAATC TCACCCTGTT GTGAGATATT 
CATGATGGTG CTGGCTCATA TAGTGTAGTC 
TCACCCAGGA ACTTGTTAGA AAGGCAAATT 
TTTATGGAAT CAGAAACTCT GGCTGTGGGG 
CAATTTCTCT GGGTGGTTCT GCGGCACACT 
ACAAATGCTA ACTTCTAATC CCCTTGATGA 
CTTCTCTAGG GACTCCATGG TCTTCAGAGT 
CAGACTGTGT CCCAGAAGCC AAAATGAGAG 
CGTGCACCCT GGGGCAGTGT CTCACCGTAT 
AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 
AAAGGATCCT ATATGAAGTT CCTGAAACTG 
TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 
TTTAATTTTT AATTATTGTT TAGTGTTTGC 
CACATTTAGA AAATAATAAG AGCAAGTTTC 
CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 
CTGGAAATAG CATTGCTTTC CTTGATTGTC 
TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 
GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 
GTGGGATATA TAACTGAGGA AGCATATTTA 
AAGGTATCAC TTGACCTGGA AAAGGAATCT 
AAAGTTTAGA GGTATGTGAA GGAAGCACTT 
CCTATCAAGT TATGTCTTCT GGGTGACAGA 
TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2751 CTGTAAGAAA CTGTCAGTGA AAATATGTAC AATTCCTTCA ATTTCCATTC 
2801 TTAACAACTG TAATGTTGAA AAATAAGTTG AAAAGTCTTT GGGACCATAC 
2851 ATGCAAAAAC GGTGCCTCTG TTACTTAATT ATTTAATATT CTATAAATGT 
2901 ACCCAATCTG TCCGCACCCT TCCCAGTGAT GGGGCAGTAT GTCTGAGGAA 
2951 GTATAATTTC AGTACTGGGG TCGGGGAGAG GAGGTGATGT TTCTACATTT 
3001 TTATTTTTTC TATAAATTGC AATTGGTCTG TATGCTGGTT TATTTTGAAA 
3051 TTTATATTGG TTTCTTTTCA AGCTGGTGTC ATCTCCTAGA CTGTTTCACC 
3101 CAGATGCTAG CATTTTTTTT TTTTTTGAGA CAGAGTCTCA CTCTGTCACC 
3151 TAGGCTGGAG TTGCAGTGGT TTGATCTCGG CTCACTGCAA CCTCCGACTC 
3201 CTGGGTTCAA GCAATTCTTC TGCCTCAGCC TCCTGAGTAG CTGGGATTAC 
3251 AGATGTGCAC CAGCACACCC GGCTAATTTT TTGTATTTTT AGTAGAGACA 
3301 GGGTTTCGCC ATGTTGGCCA GGCTGGTCTT GAACTCCTGG CCTTATGTGA 
3351 TCCGCCCACC TTGGCTTCCC AAAGTGCTGG GATTACAGGC ATGAGCCACC 
3401 TCGCCTGGCC AGATGCTAGC ATTTTAGATC AAACAATTCA TTTTAGATGA 
3451 ATTGTTTTGT TTCACAATCA TTTTAAATCA TTTTAGAATG TACTTCACAT 
3501 TATTAGTTGT GTTATGGCAT AAAGGTACAA CCATTCCCTA ACTCCATCTT 
3551 TTATTAATGC TTAAGTTTAA ATTATATTCT TCCAATGCCT AAGCTATTCC 
3601 CTAGAATTAA ACTGGGCACT TTTGGAAGCA GCAACAGTAA CAGCAGCAGC 
3651 AAACTTTTCC TCTCATATTT TGGGTGTATC AAAAGTTCTA GACTTTTGAA 
3701 GTTATGATTT CAGTGGCCCA CTTTATTTCT AAGGAAGAGT GTCTACTTTG 
3751 GAACGATACT TTGCACATAG TAGGAACTCA AGAAATACAT TTGAATAATT 
3801 ATAATTAACT GTTTAGCTAT CTTAATGAGA ATTTGTTGAC AACAAAAGAT 
3851 CATCCATCGC CTTATGTGTG AGTAAGATTG GAGCCTCTAT CAAGATTTAG 
3901 TCAAGTTCAG TTAGATTGAT TCTAGAAACA AATATTTATT TCTTTCTTTT 
3951 ACGGGGATGT GAATAAGGCT TTTCCTTAAG GCCTTCATTC TTTAAACAAA 
4001 CAGGTTGAAA TGGTATGTTG TAAAAGAGAA GACGGGAGAG AGGTATTTAG 
4051 ATGATAAGTG TACTTCACAA AAATGCCAAA GTTTGAAAAA TAGGTATGTT 
4101 TGTTCTAAAT GTTTAAGTGC TTCTCTGTTA GGTTCTGGGG CTTGCAATCA 
4151 TTTGAATTGT TCTGTTTCAC AATAAAGGAG ATTCACTGGG TTCTGCATTT 
4201 TCAGGATTCA ATAGAACTGC TCCATTAAAA AAATAATCCT TAGCAAGCAT 
4251 TCGAATCCTA ACTGCTTTGA TGCACTTGCC CTCGGGCACC TGTCATTTCC 
4301 AATATGGTAG GTGTCAAAGT CAAAAGTATT TACTGGGAGA AAAAAGAGAG 
4351 GAGTGGTTGT AGAAGTCTCC CTAAATCAGA CATGTCAAGC AATCAGCCAA 
4 401 CGTGGTGTAT TTCTCATTCA ATATTTTAGT GTGAATTGAG ACACTGAGAT 
4 451 AAAGACATCG TGCAGAGATA AATGGGGATA CAGTTAAATG TAGCAACTCT 
4501 TGAGTTCATT TTTTCCCACT GTAGCAAAAT TAATGCTTTC TCTTTATTGA 
4 551 AATAAATTGC TCATTCCTCC AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4 601 AAAAAAGG 



BLAST Results 



Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score - 1951, P - 9.0e-101, identities = 411/425 

Entry HS073350 from database EMBL: 
human STS EST303564. 
Score » 1417, P = 8.7e-58, identities - 285/287 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 2 

PIR:CGBOlS collagen alpha 1(1) chain - bovine (fragments), N = 1, Score 
- 155, P - 4.5e-10 

TREMBL : HSCG1PA1_1 gene: "COL1A1"; Human proalpha 1 (I) chain of type I 
procollagen mRNA (partial)., N - 1, Score « 155, P - 6.5e-10 
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>PIR:CGBOlS collagen alpha 1(1) chain - bovine (fragments) 
Length - 779 

HSPs: 

Score = 155 (23.3 bits), Expect » 4.5e-10, p = 4.5e-10 
Identities - 60/152 (39%), Positives =» 67/152 (44%) 



Query: 
Sbjct: 
Query: 



7 GEAGGPGAAWARRAAALPGTAA — GPPRPAAPPGA — APARGGPAPGAPAQALPRSQRGR 62 
G+ G PG + AR PG GPP PA P GA AP G A A P SQ 

230 GDLGAPGPSGARGERGFPGERGVEGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAP 289 



63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 
L G P RGA PG GD +GA G + G VR L + PG A 
Sbjct: 290 GL QGMPGE-RGAAGLPGPKGDRGDAGPKGADGAPGKDG VRGLTGPIGPPGPAG 341 



Query: 123 GAGDRGHL-P-GP DARDPELPRVFLPLAGLRGPPAA 156 

GD+G P GP D +P P P AG GPP A 

Sbjct: 342 APGDKGEAGPSGPAGTRGAPGDRGEPGPPG— P-AGFAGPPGA 381 

Score = 121 (18.2 bits), Expect - 5.4e-05, P = 5.4e-05 
Identities - 52/154 (33%), Positives * 60/154 (38%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG GPAPGAPAQALPRSQRG 61 

G G PGAA R P AGPP PPG ++G GPA G P + P G 

Sbjct: 434 GATGFPGAA-GRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPA-GRPGEVGPPGPPG 491 

Query: 62 RQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAA 121 

AGP G PG PG RG G +RG R L PG + 
Sbjct: 492 P--AGEKGAPGAD-GPAGAPGTPGPQGIAGQRGVVGLPGQRGE RGFPGL PGPS 541 

Query: 122 EGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVRE 160 

G +G R P P + GL GPP + RE 
Sbjct: 542 GEPGKQGPSGASGERGPPGP MGPPGLAGPPGESGRE 577 

Score - 117 (17.6 bits), Expect - 1.8e-04, P = 1.8e-04 
Identities = 52/148 (35%), Positives =* 62/148 (41%) 



7 GEAGGPGAAWARRAAALPGTAAGPPRPAA PPGAAPARGGPAPGAPAQALPRSQRG-R 62 

G G PG AR +A PG A G P A PPG + GP PG P A +G R 

416 GNVGAPGPKGARGSAGPPG-ATGFPGAAGRVGPPGPS-GNAGP-PGPPGPAGKEGSKGPR 472 

63 QL AE RNGRPRRH RG ALAQ PG H PG DLAAG VGRG AGG GH S RRGR H - - H H V RS L A DLLQLP G A 120 
GRP G + PG PG GA G G + ++ LPG 

473 GETGPAGRP GEVGP PG P PG PAGE KGA PGA DG P AG APGT PG P QG I AGQRG WGL PGQ 528 

121 AEGAGDRGH--LPGPDARDPEL-PRVFLPLAGLRGPP 154 

G+RG LPGP + P +G RGPP 

529 R GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score = 117 (17.6 bits), Expect = 1.8e-04, P - 1.8e-04 
Identities * 54/162 (33%), Positives = 64/162 (39%) * 

Query: 7 GE AGG PG AAW ARRAAAL PGT — AAGPPRPAAPPGAAPARG — GPA — PGAPAQALPRSQR 60 

G G PG + PG A+GP P PPG GGAPGP+P + 

Sbjct: 29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

Query: 61 G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV— RSLADLL 115 

G R L G P + HRG G GD +G G G + R L 

Sbjct: 89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRGLPGFP 148 

Query: 116 QLPGAA--EG-AGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157. 

GAA G AG+RG +PGP P AG +GPP A 

Sbjct: 149 GPKGAAGEPGKAGERG-VPGPPGAVG — PAGKDGEAGAQGPPGPA 190 

Score = 113 (17.0 bits), Expect = 5.4e-04, P = 5.4e-04 
Identities => 54/148 (36%), Positives » 58/148 (39%) 



Query: 


7 


Sbjct: 


374 


Query: 


58 


Sbjct: 


434 


Query: 


118 


Sbjct: 


487 


Score 


« 110 



GEAGG PGAA W ARRAAAL PGT A - 
G AG PGA A PG A 



-AGPPRPAAP- 
AGPP PA P 



-PGAAPARGGPAP-GAPAQALPR 57 
PG G P P GA A P 



PG PG 



PG 



AG++G PG D 



+G 



+AG RG 



GR 



486 



110 (16.5 bits), Expect = 1.3e-03, P ~ 1.2e-03 
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Identities = 54/151 (35%), Positives - 60/151 (39%) 



Query: 


7 


GE AGG PGAAW ARRAAAL PGT AAG P P RP AA PPG — AA P AR •* GG P A P - GAP AQALP RS QRG R 


62 






GE G G A + LPG A GPP A PG P G P P GA + +RG 




Sbjct: 


194 


GERGEQGPAGSPGFQGLPGPA-GPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGV 


252 


Query: 


63 


QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 


122 






+ PR GA G GD A G+ G +G R A L PG 




Sbjct: 


253 


EG P PG PAG P RG AN G A PGN DG AKG DAG A PG A PG SQG A PGLQGM PG E - RG AAGL PG PK - 


307 


Query: 


123 


GAGDRGHLPGPDARD — PELPRVFLPLAGLRGPPAAA 157 








GDRG GP D P V L G GPP A 




Sbjct: 


308 


— GDRGDA-GPKGADGAPGKDGV-RGLTGPIGPPGPA 340 




Score 


« 109 


(16.4 bits), Expect = 1.7e-03, P = 1.7e-03 




Identities - 


= 55/154 (35%), Positives - 60/154 (38%) 




Query: 


4 


NGN - G E AGG PGAAW ARRAAAL PGT AAG P P R P AAP PG AAP ARG -G P A PG AP AQAL P RSQRG 


61 






NG+ GEAG PG R P AGP A PG RG GA A P +G 




Sbjct: 


67 


NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKG 


125 


Query: 


62 


RQL A E - RN G R PRRHRG AL AQPGH PG DIiAAG VG RG AGGGH S RRGRH H H VRS L A DLL 


115 






+ NGP+G PGPGA GG G V A 




Sbjct: 


126 


EPGSPGENGAPGQ-MGPRGLPGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQ 


184 


Query: 


116 


QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 








PG A AG+RG GP A P F L G GPP A 




Sbjct: 


185 


GPPGPAGPAGERGE-QGP-AGSPG FQGLPGPAGPPGEA 220 




Score 


« 104 


(15.6 bits), Expect = 6.6e-03, P = 6.6e-03 




Identities = 44/131 (33%), Positives = 49/131 (37%) 




Query: 


2 


EVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 


60 






E GE G PG R LPG GP A PG A RG P P GA A + 




Sbjct: 


126 


EPGSPGENGAPGQMGPR GLPGFP-GPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEA 


181 


Query : 


61 


GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 


120 






G Q P RG G PG G+ G G G+ DL PG 




Sbjct: 


182 


G AQG P PG PAG PAG E RGEQG PAG S PG — FQGL P - G PAG P PGE AGK PGEQG V PGDL - G A PG P 


237 


Query : 


121 


AEGAGDRGHLPG 132 








+ G+RG PG 




Sbjct: 


238 


SGARGERG- FPG 248 




Score 


«- 104 


(15.6 bits), Expect - 6.6e-03, P = 6.6e-03 




Identities = 43/131 (32%), Positives = 55/131 (41%) 




Query: 


7 


GEAGGPGAAW ARRAAAL PGT AAG PPRPAAPPGAAPARGGP A PGAP AQALP RSQRGRQLAE 


66 






GEAG GARA PG GPP PGA GP PGA Q + + G A+ 




Sbjct: 


347 


GEAGPSGPAGTRGA PGDR-GEPGPPGPAGFA GP- PGADGQPGAKGEPGDAGAK 


397 


Query: 


67 


RNGRPRRHRGAL AQPGH PGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 


126 






+ P G PG G++ A +GA G G + A + PG + AG 




Sbjct: 


398 


GDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGA-AGRVGPPGPSGNAGP 


456 


Query: 


127 


RGHLPGPDARD 137 








G PGP ++ 




Sbjct: 


457 


PGP-PGPAGKE 4 66 




Score 


« 104 


(15.6 bits). Expect = 6.6e-03, P = 6.6e-03 




Identities = 


= 56/162 (34%), Positives = 62/162 (38%) 




Query: 


7 


GE AGG PGAAW ARRAAAL P GTAA — G P P RP AAP PG AA P ARGG P APG A P AQ AL P RS QRGRQL 


64 






G G PGA A G GP P PGA ARG P P Q PR +G 




Sbjct: 


608 


GPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGARG PAGP-QG-PRGBKGZTG 


662 


Query: 


65 


AERNGRPRRHRG ALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLA-DLLQ-LPG 


119 






+ + + HRG PG PG GA G RG SDL LPG 




Sbjct: 


663 


ZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGFSGASGPAGPRGPPGSAGSPGKDGLNGLPG 


722 



Query: 120 AAEGAGDRGHL— PGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQ 168 

G RG GP A P P P G GPP+ L +P Q 

Sbjct: 723 PIGPPGPRGRTGDAGP-AGPPGPPG P-PGPPGPPSGGYDLSFLPQPPQ 768 

Score - 101 (15.2 bits), Expect - 1.5e-02, P - 1.5e-02 
Identities « 49/148 (33%), Positives - 55/148 (37%) 



Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P G 
Sbjct: 152 GAAGEPGKAGERGVPGPPG-AVGP AGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGF 207 
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Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

Q P G + G PGDL A G G RG R + PG A 

Sbjct: 208 QGLPGPAGPPGEAGKPGEQGVPGDLGAP GPSGARGERGFPGE-RGVEGP PGPAG 260 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

G G PG D + PG+GP 
Sbjct: 261 PRGANG-APGNDGAKGDAGAPGAP--GSQGAP 289 

Score « 100 (15.0 bits), Expect - 1.9e-02, P « 1.9e-02 
Identities - 40/130 (30%), Positives = 48/130 (36%) 

Query: 7 GEAGGPGAAWARRAAALPGT— AAGPPRPAAPPGAAPARG— GPA— PGAPAQALPRSQR 60 

G G PG + PG A+GP P PPG GGAPGP+P + 

Sbjct: 29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

Query: 61 G-RQLAERNGRP — RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P + HRG G GD +G G G + L 

Sbjct: 89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRG-LPGF 147 

Query: 118 PG AAEGAG DRG 128 

PG AG+ G 
Sbjct: 148 PGPKGAAGEPG 158 

Score = 99 (14.9 bits), Expect *= 2.5e-02, P = 2.5e-02 
Identities * 53/156 (33%), Positives - 61/156 (39%) 

Query: 7 GEAGGPGAAWARRA— AALPGT— AAGPPRPAAPPGAAPARG— GPA PGAPAQAL 55 

G G PGA R A PG AGPPPG+RG GPA P PA A 

Sbjct: 587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 646 

Query: 56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

Sbjct: 647 GPAGPQGPRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGP- 705 

Query: 109 RSLADLLQLPGAAEGAGDRG--HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 706 PGSAGSPGKDGLNGLPGPIG — PPGPRGRTGDAGPAGPP 742 

Score = 98 (14.7 bits), Expect = 3.3e-02, P = 3.3e-02 
Identities - 51/158 (32%), Positives « 58/158 (36%) 

Query: 7 GEAGGPGAAWARRAAALPGTA AGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 60 

G G G R AA LPG AGP PG RG P G P A + 

Sbjct: 287 GAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDK 34 6 

Query: 61 GRQLAERNGRPRRHRGA LAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

Sbjct: 347 GE — AGPSG- P AGT RG A PG DRGE PGP PG PAG FAG P PGADGQPG AKG E PG D AG AKG D AG P - 402 

Query: 118 PGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVR 159 

PGAAGG+ AP+R GGPAAR 
Sbjct: 403 PGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGAAGR 444 

Score = 96 (14.4 bits), Expect - 5.7e-02, P 5.5e-02 
Identities = 46/152 (30%), Positives « 57/152 (37%) 

Query: 6 NGEAGGPGAAWARRAAALPGTAA— GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG A G P P PA ++ R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

P RG G G+ +G G RG H R + L PG 

Sbjct: 634 AGPIGPVGPAGARGPAGPQGPRGB KGZTGZZGBRGIKGH-RGFSGLQGPPGPPG 6B6 



Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 SPGEQG— PS-GASGP AGPRGPPGSA 709 

Score = 94 (14.1 bits), Expect = 9.7e-02, P = 9.2e-02 
Identities = 45/134 (33%), Positives = 56/134 (41%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



24 PGT AAG P P RP AA P PG AA P A RGG PA - PGA P AQAL P RSQRG RQLAE RNGRP RRH R — G ALAQ 80 

P GPP PG +G P PG P + P RG G P ++ G + 

21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 

81 PGHPGDLAA-GV — GRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGDRGH--LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

76 PGRPGERGPPGPQGARGLPGTAGLPGMKGH-RGFSGLDGAKGDAGPAGPKGEPGSPGENG 134 

136 RDPEL- PRVFLPLAGLRGPPAAA 157 
++ PR LP G GP AA 
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Sbjct: 135 APGQMGPRG-LP — GFPGPKGAA 154 

Score - 92 (13.8 bits), Expect = 1.7e-01, P » 1.5e-01 
Identities = 52/155 (33%), Positives = 58/155 (37%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGP-APGAPAQALPRSQRGRQLA 65 

GEAG G A R A G GPP PA G AGPAGP A + G 
Sbjct: 347 GEAGPSGPAGTRGAPGDRGEP-GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGP 405 

Query: 66 ERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGR--HHHVRSLADLLQLPGAA-- 121 

P G + PG G + GA G GR A PG A 

Sbjct: 406 AGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGK 465 



Query: 


122 


Sbjct: 


466 


Score 


= 92 


Identities 


Query: 


7 


Sbjct: 


587 


Query: 


61 


Sbjct: 


647 


Query: 


116 


Sbjct: 


704 


Score 


= 90 


Identities « 


Query: 


7 


Sbjct: 


485 


Query: 


66 


Sbjct: 


539 


Query: 


115 


Sbjct: 


599 


Score 


= 83 


Identities * 


Query: 


7 


Sbjct: 


311 


Query: 


61 


Sb j ct : 


368 


Query: 


121 


Sbjct: 


424 


Score 


= 82 


Identities - 


Query: 


7 


Sbjct: 


275 


Query: 


67 


Sbjct: 


333 


Query: 


127 


Sbjct: 


388 



EG+ G RG GP R E+ 



P AG +G P A 



51/156 (32%), Positives = 57/156 (36% > 



G G PGA 



PG 



AGPP PG + RG 



LAERNGRPRRHRGALAQPGHPGDLA-AGVG--RGAGGGHSRRGRH — HHVRSLADLL 115 

AGPR+G+GG G+GG G A 

-AGPQG-PRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR 703 



PG+A 



G G 



LPGP 



P PR AG GPP 

-PPGPRGRTGDAGPAGPP 742 



[13.5 bits), Expect = 2.8e-01, P = 2.5e-01 
» 45/134 (33%), Positives = 53/134 (39%) 

G EAGG PG AAW ARRAAAL PGT AAG P PR P AA P P G AA P ARGG P A PG AP AQAL P RS Q RG RQ- L A 65 
G G PG A + A GA P P PGA RG GPQ R +RG L 
GPPGPPGPAGEKGAPGADGPAGAPGTPG-PQGIAGQRG--VVGLPGQ RGERGFPGLP 538 



+G P + 



PAG 



GA + G PG + 



PGP 



AG 



-GR-GAGGGHSRRGRHHHVRSLADL 114 
GR GA G GR + D 



• 49/156 (31%), Positives =• 56/156 (35%) 

G EAGG PG AAW ARRAAAL PGT AA — GPPRPAAPPGAAPARG — GPAP- 
G+AG GA A + G GPP PA PG G GPA 



G PGD A 



-GAPAQALPRSQR 60 
GAP R + 



PG 



G G PG 



PRVFLP L AG LRG P P AAA V RE 160 

RV P AG GPP A +E 



DO, P = 9.0e-01 
52/148 (35%) 



GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQLAE 66 
G+AG PGA ++ALGG A PG RGPAP R L 



G P 



+G PG 



PG G+ 



G G 



RG 



PGA 



A+ 



P AG GPP 
-P-AGPAGPP 412 



Peptide information for frame 3 
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ORF from 12 bp to 755 bp; peptide length: 248 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (17-39) 
LEUCINE ZIPPER (24-46) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 3 

TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N « 1, Score 
= 135, P « le-06 

TREMBL:HS6802_1 gene: "dJ6802.1"; product: M dJ6802 . 1"; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22 . Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N - 1, Score =» 107, 
P = 0.0023 



>TREMBL : AF07 0675_1 product: "TNF-inducible protein CG12-1"; Homo sapiens 
TNF-inducible protein CG12-1 mRNA, complete cds. 
Length - 331 

HSPs: 

Score = 135 (20.3 bits), Expect = 1.0e-06, P = 1.0e-06 
Identities - 30/103 (29%), Positives = 55/103 (53%) 



Query: 30 RLHRQVLRLREVARRLERLRRRSLVANVAGSSLSATGALAAIVGLSLSPVTLGTSLLVSA 89 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKL RALAN G I EEVH RGC T I S NV V S S S TG AA SGIMSLAGLVLAP FT AGT S LALT A 150 

Query: 90 VGLGVATAGGAVTITSDL-SLIFCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S E + AT D+++ 

Sbjct: 151 AGVGLGAASAVTGITTSIVEHSYTSSAEAE-ASRLTATSIDRLK 193 

Pedant information for DKFZphtes3_18f 3, frame 2 



Report for DKFZphtes3_18f 3 . 2 

[LENGTH] 193 
[MW] 19708.24 
[pi] 11.90 
[KW] All_Alpha 

[KW] LOW_COMPLEXITY 55.44 % 

SEQ TEVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

SEQ AEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQFCLLHRLLWLTW 

SEG xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccc 



(No Prosite data available for DKFZphtes3_18f 3 . 2) 
(No Pfam data available for DKFZphtes3_18f3 .2) 

Pedant information for DKFZphtes3_18f 3, frame 3 
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Report for DKFZphtes3_18f 3 . 3 



(LENGTH) 248 

[MWJ 27162.56 

IpD 9.92 

[PROSITE] LEOCINE_ZIPPER 2 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 30.65 % 

[KW] COILED COIL 12.10 % 



SEQ MGME R P AA RE PH G P D ALRR FQG LL L DRRGRL H RQVLRLRE V A R RL E RL RRRS L V AN VAG S 

SEG XXXXXXXXXXXXXXXXXX . XXXXXXXXXXXXXXXXXXXX . . XXX 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 

COILS 

MEM 



SEQ SLSATGALAAIVGLSLSPVTLGTSLLVSAVGLGVATAGGAVTITSDLSLIFCNSRELRRV 

SEG XXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 

SEQ QEIAATCQDQMREILSCLEFFCRWQGCGDRQLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 



COILS 
MEM 



SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 



Prosite for DKFZphtes3_18f 3 . 3 

PS00029 17->39 LEUCINE_ZIPPER PDOC00029 

PS00029 24->46 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_18f 3 . 3) 
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DKFZphtes3_1817 



group: cell structure and motility 

DKFZphtes3_1817 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins . 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat. 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-based membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 



similarity to ankyrins 
Sequenced by MediGenomix 
Locus : unknown 
Insert length: 4501 bp 

Poly A stretch at pos. 4423, no polyadenylation signal found 



1 GATCGCCGCG CGAGGGTGGT 

51 AGGTGCCGCC GTCGCCCAGG 

101 TAAGAGATTT GCTCTGACCC 

151 AGACCTCCTG AAAAATCCTT 

201 ACTTGTGCAG CAAAGTGGCC 

251 AAAGGAAGCC TGTCGAGCAG 

301 CATTTTGATA CCTGTGGAAG 

351 TCTTTATTCA AGGGAACAGG 

401 CTCTCAGTGC CCATTCTCTT 

451 GAGTTTCAGC ATCCTGTGTA 

501 CAGAAGAGCC TTTGGCACCC 

551 GATGTGAGAG AGTTCTTGGG 

601 CGCCTCTTTC CATCGAACAT 

651 ACCACATAGA CTCAGCGAAT 

701 CTGAGGGACT CTCACCTGAA 

751 CCTGATGAAG CAGGCAGTGG 

801 TGATCTTTAA ATACGTGGGG 

851 AACAAAATCA CAAGAAGCCT 

901 GAAACCGGAG TTCAGCTTTA 

951 AGCTGAACAA ATGCACCTCC 

1001 GTGGTGCAGC TCATTACACA 

1051 CATGTGTGCT GATGATCTGC 

1101 CGGAGATCCC TAATTGGATG 

1151 TTTAGCAGCT TGGCAAAGGA 

1201 AGCTGCCATT GAATATATTC 

1251 AGTCTGAGGG ATTTGGAGAC 

1301 CTCTCTCAGA TGACTTCGTC 

1351 ATCAGGTAAC CAGAAAGAAG 

1401 ATAAAGATAC CGTCCAAAAG 

1451 TGTGAGAAAC TCGTCTCTGG 

1501 ATTCTCCAGA GACGACAGGG 

1551 GTGGGCAGGC ATCCCTCATC 

1601 AATGCCACAG ACTACCATGG 

1651 GGGCTACCAG AGCGTGACGC 

1701 AAGTGCAGGA CAACAATGGG 

1751 GGCCACGAGG ACTGTGTGAA 

1801 CAGACTTGAC ATTGGCAATG 

1851 CCCGCTGGGG CTACCAAGGC 

1901 TCCACCGAGA TCCAGAACAG 

1951 AAACTCAAAG ATTCTGTCTG 

2001 GGAGGCAGAA GTCGTCCGAG 

2051 GACTCCATCA GCCAAGAGTC 

2101 CGGCTCAAGG CAGGAGGAGA 

2151 TTTTGAGAGC AGTTGCTGAT 

2201 GAATGGACAG AGGAGGACCT 

2251 AGACCCCGAA TTCTGTCACC 

2301 CTCAGAAGAG GCTGGCGAAG 

2351 ACCAGCCAGG ACGGCTCCTC 

2401 GGCGGACCTC ATCCGCCTCC 

2451 GGAACGCAGA CCAAGCCGTC 

2501 TTTCAGGTGG TGAAGTGTCT 

2551 GGACCTCAGT GGAAACACGC 

2601 ACGAGCTTGT GGCACTGCTG 

2651 AACAATAAGG GCAACACAGC 



GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
ATCTGAAGTC CATATGGCTC TGTATGATGA 
TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
CAAATCCATG GCATTGTCTT AGTACCCTGC 
CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
AGCATTTTCA GACCTTAAAT GGAAAGGATG 
ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
TGAAGAAACT TTCTACAATG AAAAAGAAGA 
TAGCCCATCC TTTGGAAAAG AG AG AG AG TT 
TCAGATCCCT TTTCCCTGAA AACCATTGAA 
AAGACACTCC GAGCGATTTG ACAGGAACAT 
TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
AGATATACGT CCATCATGAA ATTTACAACC 
ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
TCAAGATCTT CAGCAGAAAG ATATTGGTGT 
ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
TATCAGTCCT GTTATACTTG CTTGTGAAAA 
GCAAATTTGA GTTACATCAA AAACTTCAGG 
TGAACTGGGA TACTGCCTGA CCTCATTCGA 
GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
TCCCACCGAC TGCCTGTTTA AGCACATTGC 
TGGAGAGACT TCTGAGCCAA GAGGACCATG 
ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
GGCACACCCC TCTCCATGTG GCTGCTGTCT 
GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
AATACGCCAC TCCACCTGGC CTGCACCTAC 
GGCTCTGGTT TACTACGACG TGGAGTCGTG 
AGAAAGGAGA CACCCCTCTA CACATTGCTG 
GTCATAGAGA CATTGCTGCA GAACGGAGCG 
ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
TAATGGAAGC CTATCACCTG TCCTTCGAGA 
GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
CCAAGAAGGA CT AC AG AG AG GTAGAAAAAC 
GGAGATCTAG AAATGGTGCG TTACCTGTTG 
GGAGGATGCG GAGGACACTG TCAGTGCAGC 
CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
GTTAGATTCG AATGCAAAAC CCAATAAGAA 
CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
CTACAGCACG GGGCCTCCAT TAACGCTTCT 
GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC TGCTCCACGG AGCGTCAGTT CAGGTGCTGA 
2751 ACAAGCGGCA GCGCACGGCT GTAGACTGTG CTGAACAGAA TTCAAAAATA 
2801 ATGGAATTGC TTCAGGTGGT ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
2851 GGCTGAAACT GACCGCAAGG AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
2901 GGAACTCAAA ACTGTATGAT CTACCAGATG AGCCTTTTAC AAGACAGTTT 
2951 TACTTTGTCC ACTCAGCTGG TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
3001 TATGGCAAGA GATAGAAGTG TCCCTAATTT AACCGAAGGT TCTTTGCATG 
3051 AGCCAGGGAG GCAAAGTGTC ACACTGAGAC AGAATAACCT GCCAGCTCAG 
3101 AGTGGATCTC ATGCTGCTGA GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
3151 TGGACTGACA CAGACTGGCC CTGGACACAG ACGGATGCTG CGGAGACACA 
3201 CGGTAGAGGA TGCGGTCGTG TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
3251 TCCACTCCCC AAGAGGTTAG TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
3301 TGTTGAACCC ACTGCTAGGA AGCAAGGATG CAACAAGATG ATGCTGAGCG 
3351 TGAACACATC TGAGAACTAA ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
3401 CTTCAGCACC AAGTTCCTGA AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
3451 AAAAAAGTTA ACCACCACCA TCTCTCTCCT CTTCAAAGCT AATGAATACA 
3501 ATTGAAACAG ACAAAAATTC CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
3551 GCATGCTTCT TTTTAAGTAT GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
3601 TCACCACCGC ATTCTGACCT CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
3651 ACCTGTGTAC ATTCACAAAC CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
3701 GCTGGAGAGA AGTAAGTAAT TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
3751 TGAAATGTCA TATCTGAAGG AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
3801 GCAAAGCAAC ACTCGAACCA AAAGATGCCT CAATCCCATT TTGATATTCA 
3851 TTTTAGTGAA AGGATGCATC AGACCTGTTC CACATCATGC ACATGGGAAA 
3901 GGGTGGTTAT CATTTTCCTT CTAACAAGTA GGTACAGATA TTCGGTTACT 
3951 ACACGTGCAC CTGTAGCAGT ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
4001 CCTCCCTTGA ATGTCTGTCA CACTCACACC TGACGGGATG GTTACTGGAT 
4051 TAGAGAGTAG ATTTGGCACA TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
4101 AACTTAACAG CACAAACCAG GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
4151 CCATTTATTC CTTTTTATAA ATTTCTATAG ATTATACTGT TATTTTTATG 
4201 TTATTGGCCT AGAGCTACAC GTATATGGGT TTGTCCTGAG TCCGTTTTCA 
4251 AATGACCTTG TGATAGGGAA ATGGTTTTGT CCATGTTCTT GGAAATACTT 
4301 GTGTATGTAC AGAAGGAAGG GAGGGATTAT TTTTCTACAA AGTAATTTAT 
4351 GATTTCTAAT TTTCTAATGT GCCTTGGATA TGTGCCAAAT GATGGAAAAG 
4401 AAACAGTAAA CTTTATGATT CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 
4501 G 



BLAST Results 



No BLAST result 



•Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP GTP A {945-953) 



1 MALYDEDLLK NPFYLALQKC RPDLCSKVAQ IHGIVLVPCK GSLSSSIQST 
51 CQFESYILIP VEEHFQTLNG KDVFIQGNRI KLGAGFACLL SVPILFEETF 
101 YNEKEESFSI LCIAHPLEKR ESSEEPLAPS DPFSLKTIED VREFLGRHSE 
151 RFDRNIASFH RTFRECERKS LRHHIDSANA LYTKCLQQLL RDSHLKMLAK 
201 QEAQMNLMKQ AVEIYVHHEI YNLIFKYVGT MEASEDAAFN KITRSLQDLQ 
251 QKDIGVKPEF SFNIPRAKRE LAQLNKCTSP QQKLVCLRKV VQLITQSPSQ 
301 RVNLETMCAD DLLSVLLYLL VKTEIPNWMA NLSYIKNFRF SSLAKDELGY 
351 CLTSFEAAIE YIRQGSLSAK PPESEGFGDR LFLKQRMSLL SQMTSSPTDC 
401 LFKHIASGNQ KEVERLLSQE DHDKDTVQKM CHPLCFCDDC EKLVSGRLND 
4 51 PSVVTPFSRD DRGHTPLHVA AVCGQASLID LLVSKGAMVN ATDYHGATPL 
501 HLACQKGYQS VTLLLLHYKA SAEVQDNNGN TPLHLACTYG HEDCVKALVY 
551 YDVESCRLDI GNEKGDTPLH IAARWGYQGV IETLLQNGAS TEIQNRLKET 
601 PLKCALNSKI LSVMEAYHLS FERRQKSSEA PVQSPQRSVD SISQESSTSS 
651 FSSMSAGSRQ EETKKDYREV EKLLRAVADG DLEMVRYLLE WTEEDLEDAE 
701 DTVSAADPEF CHPLCQCPKC APAQKRLAKV PASGLGVNVT SQDGSSPLHV 
751 AALHGRADLI RLLLKHGANA GARNADQAVP LHLACQQGHF QWKCLLDSN 
801 AKPNKKDLSG NTPLIYACSG GHHELVALLL QHGASINASN NKGNTALHEA 
851 VIEKHVFWE LLLLHGASVQ VLNKRQRTAV DCAEQNSKIM ELLQWPSCV 



641 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_1817, frame 2 

TREMBL:HSU43965 1 gene: "ANK3" ; product: "ankyrin G119"; Human ankyrin 
G119 (ANK3) mRNA, complete cds., N = 2, Score = 287, p = 3.7e-21 

PIR: 149502 ankyrin - mouse, N = 3, Score = 365, P = 2.2e-27 

TREMBL : HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 

ankyrin (variant 2.1), N" - 2, Score » 380, P = 7.3e-31 

SWISSPROT:ANKl_HUMAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) (ERYTHROCYTE 
ANKYRIN) . , N - 2, Score - 380, P « 8.2e-31 

PIR:SJHUK ankyrin 1, erythrocyte splice form 1 - human, N - 2, Score ■» 
380, P « 8.2e-31 



>TREMBL:HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length « 1,719 

HSPs : 



Score 


= 380 


(57.0 bits), Expect = 7.3e-31, Sum P(2) = 7.3e-31 




Identities = 


= 139/447 (31%), Positives = 207/447 (46%) 




Query: 


462 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 






+G+T LH+AA+ GQ ++ LV+ GA VNA G TPL++A Q+ + V LL A+ 




Sbjct: 


77 


KGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENHLEWKFLLENGAN 


1 Jo 


Query: 


522 




558 






V +G TPL +A GHE+- V L+ Y + RL 




Sbjct: 


137 


QNVATEDGFTPLAVALQQGHENVVAHLINYGTKGKVRLPALHIAARNODTRTAAVLLQND 


196 


Query: 


559 


DIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 


615 






D+ ++ G TPLHIAA + V + LL GAS + TPL A S+ +V+ 




Sbjct: 


197 


PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTPQNGITPLHIA — SRRGNVIM 


254 


Query: 


616 


AYHLSFERRQKSSEAPVQSPQRSVDSISQESSTS-SFSSMSAGSR-QEETKKDYREVEKL 


673 






L +R + E + + ++ S + G+ Q +TK + 




Sbjct: 


255 


V-RLLLDRGAQI-ETKTKDELTPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHM- 


311 


Query: 


674 


LRAVADGD-LEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPA 


732 






A GD L+ VR LL++ E ++D T+ P H C R+AKV 




Sbjct: 


312 


AAQGDHLDCVRLLLQYDAE-IDDI — TLDHLTP — LHVAAHC GHHRVAKVLL 


358 


Query: 


733 


S-GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQ 


791 






G N + +G +PLH+A ++ LLLK GA+ A PLH+A GH 




Sbjct: 


359 


DKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLP 


418 


Query: 


792 


VVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAV 


851 






+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 




Sbjct: 


419 


IVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAA 


478 


Query: 


852 


IEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVV 896 








H +V+LLL + A+ + T + A + + +L ++ 




Sbjct: 


479 


RIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALL 523 




Score 


= 378 


(56.7 bits), Expect - 1.2e-30, Sum P(2) = 1.2e-30 




Identities * 130/447 (29%), Positives = 195/447 (43%) 




Query: 


465 


TPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 


524 






TPLH AA G + ++L+ GA + A +G +P+H+A Q + LLL Y A + 




Sbjct: 


274 


TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDD 


333 


Query: 


525 


QDNNGNTPLHLACTYGHEDCVKALVYYDVE SCR 


557 






+ T PLH+A GH K L+ + +C+ 




Sbjct: 


334 


ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 


393 


Query: 


558 


LDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 


614 






+D EG TPLH+A+ G+ +++ LLQ GAS + N ETPL A + V 




Sbjct: 


394 


GASIDAVTESGLTPLHVASFMGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVA 


453 



642 



WO 01/12659 
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Query: 


615 


EAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLL 


674 






+ Y L + + + Q+P I + +A T L 




Sbjct: 


454 


K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 


508 


Query: 


675 


RAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPASG 


734 




A +G +E V LLE ++AT PH+KA+L + 




Sbjct: 


509 


IAAREGHVETVLALLE— -KEASQACMTKKGFTP— LHVAAKYGKVRVAELLLER D 


559 


Query: 


735 


LGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVK 794 




N ++G +PLHVA H D+++LLL G + + + PLH+A +Q +V + 




Sbjct : 


560 


AHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVAR 


619 


Query: 


795 


CLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEK 


D R A 




LL N + + G TPL A GH E+VALLL A+ N N G T LH E 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEG 


679 


Query: 


855 


HVFVVELLLLHGASVQVLNKRQRTAVDCAEQ-- NSKIMELL 893 








HV V ++L+ HG V + T + A N K+++ L 




Sbjct: 


680 


HVPVADVL I KHGVMVDATTRMGYTPLH VASH YGNI KLVKFL 720 




Score 


= 367 


(55.1 bits), Expect ■» 1.8e-29, Sum P(2) = 1.8e-29 




Identities = 131/489 (26%), Positives = 210/489 £42%) 




Query : 


404 


HI AS — GNQKEVERLLSQEDHDKDTVQKMCHPL-CFCDDCEKLVSGRLNDPSVVTPFSRD 


460 






HIAS GN V LL + + + PL C + +S L D ++ 




Sbjct: 


244 


HIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRISEILLDHGAPIQ-AKT 


302 


Query: 


461 


DRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKA 


520 






G +P+H+AA + LL+ A ++ TPLH+A G+ V +LL A 




Sbjct: 


303 


KNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHCGHHRVAKVLLDKGA 


362 


Query: 


521 


SAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGV 


580 






+ NG TPLH+AC H ++ L+ +D EG TPLH+A+ G+ + 




Sbjct: 


363 


KPNSRALNGFTPLHIACKKNHVRVMELLLK TGASIDAVTESGLTPLHVAS FMGHLPI 


419 


Query: 


581 


IETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQSPQR 


637 






++ LLQ GAS + N ETPL A ++++ + + K + P+ R 




Sbjct: 


420 


VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAAR 479 


Query: 


638 


SVDS I SQESSTSS FS SMS AGS RQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTE 693 






++ + E++ + + +AG VE +L + + +T 




Sbjct: 


480 


IGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALLEKEASQACMTKKGFTP 


539 


Query: 


694 


EDLEDAEDTVSAAD PEFCHPLCQ CP-KCAPAQKRLAKVPA SGLGVNVTS 


741 




+ VA+ HP PALV G+ + 




Sbjct: 


540 


LHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPA 


599 


Query: 


742 


QDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQWKCLLDSNA 


801 






+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct: 


600 


WNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQA 


659 


Query: 


802 


KPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVEL 


861 






N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ 




Sbjct: 


660 


NGNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKF 


719 


Query: 


862 


LLLHGASVQVLNK 874 








LL H A V K 




Sbjct: 


720 


LLQHQADVNAKTK 732 




Score 


- 345 


(51.8 bits), Expect « 4.2e-27, Sum P(2) - 4.2e-27 




Identities = 


= 146/506 (28%), Positives = 233/506 (46%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQK MCHPLCFCDDCEKLVSGRLNDPSVVTPFS 


458 




H+AS G+ K V LL +E + T +K H +++V +N + V + 




Sbjct: 


50 


HLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQ-DEVVRELVNYGANVN— A 


106 


Query: 


459 


RDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHY 


518 




+ +G TPL++AA ++ L+ GA N G TPL +A Q+G+++V L++Y 




Sbjct: 


107 


QSQKGFTPLYMAAQENHLEWKFLLENGANQNVATEDGFTPLAVALQQGHENVVAHLINY 


166 


Query: 


519 


KASAEVQDNNGNTP-LHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGY 


577 




+V+ P LH+A ++D A V + D+ ++ G TPLHIAA + 




Sbjct: 


167 


GTKGKVR LPALHIAAR— NDDTRTAAVLLQNDP-NPDVLSKTGFTPLHIAAHYEN 


218 


Query: 


578 


QGVIETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQS 


634 




V + LL GAS + TPL A N ++ ++ E + K P+ 




Sbjct: 


219 


LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHC 


278 


Query: 


635 


PQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGD-LEMVRYLLEWTE 


693 






R+ E + + A +TK + A GD L+ VR LL++ 




Sbjct: 


279 


AARNGHVRISEILLDHGAPIQA KTKNGLSPIHM AAQGDHLDCVRLLLQYDA 


329 



643 
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Query: 


694 




7 "5 Q 






E ++D D++ CH + + P C R+ + 




Sbjct: 


330 


E-IDDITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVME 


388 


Query: 


730 


vPA~SGLGVNVTSQDGSSPLHvAALHGRADLIRLLLKHGAN^ 


/DO 






+ +G ++ ++ G +PLHVA+ G +++ LL+ GA+ N PLH+A + G 




Sbjct: 


389 


LLLKTGASIDAVTESGLTPLHVASFMGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAG 


448 


Query: 


789 


HFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALH 


848 






H +V K LL + AK N K TPL A GH +V LLL++ A+ N + G+T LH 




Sbjct: 


449 


HTEVAKYLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLH 


508 


Query: 


849 


EAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIM--ELL 893 








A E HV V LL AS + K+ T + A + K+ ELL 




Sbjct: 


509 


IAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELL 555 




Score 


- 243 


(36.5 bits), Expect = 1.6e-14, Sum P(2) « 1.6e-14 




Identities = 64/199 (32%), Positives = 97/199 (48%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 


461 






H+A+ G + E LL ++ H + PL L +L P +P S 




Sbjct: 


541 


HVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAW 


600 


Query: 


462 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 






G+TPLH+AA Q + L+- G NA G TPLHLA Q+G+ + LLL +A+ 




Sbjct: 


601 


NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 


660 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 


581 






+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGV— MVDATTRMGYTPLHVASHYGNIKLV 


717 


Query: 


582 


ETLLQNGASTEIQNRLKETPL 602 








+ LLQ+ A + +L +PL 




Sbjct: 


718 


KFLLQHQADVNAKTKLGYSPL 738 




Score 


= 242 


(36.3 bits), Expect = S.Oe-29, Sum P(2) - 5.0e-29 




Identities = 63/176 (35%), Positives = 92/176 (52%) 




Query: 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 






G VN T Q+G +PLH+A+ G ++RLLL GA + D+ PLH A + GH ++ 




Sbjct: 


229 


GASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRIS 


288 


Query: 


794 


KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 


853 






+ LLD A K +-G +P+ A G H + V LLLQ+ A 1+ T LH A 




Sbjct: 


289 


EILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHC 


348 


Query: 


854 


KHVFWELLLLHGA— SVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAET 909 






H V ++LL GA + + LN + C + + ++MELL AS+D V E+ 




Sbjct: 


349 


GHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKTG ASIDAVTES 403 


Score 


- 242 


(36.3 bits), Expect = 3.3e-14, Sum P(2) » 3.3e-14 




Identities = 80/284 (28%), Positives = 129/284 (45%) 




Query: 


404 


HIAS--GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 


461 






HIA+ G+ + V LL +E +K PL K+ L P + 




Sbjct: 


508 


HIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGK 


567 


Query: 


4 62 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 






G TPLHVA ++ LL+ +G ++ ++G TPLH+A ++ V LL Y S 




Sbjct: 


568 


NGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGS 


627 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 


581 






A + G TPLHLA GH + V L+ ++GN+ G TPLH+ A+ G+ V 




Sbjct: 


628 


ANAESVQGVTPLHLAAQEGHAEMVALLLSKQANG NLGNKSGLTPLHLVAQEGHVPVA 


684 


Query: 


582 


ETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPV-QSPQR 


637 






+ L+++G + R+ TPL A N K++ + + + K +P+ Q+ Q+ 




Sbjct: 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


744 


Query: 


638 


S-VDSISQ — ESSTSSFSSMSAGSRQEETKK — DYREVEKLLRAVAD 679 








D ++ ++ S S G+ K Y V +L+ V D 




Sbjct: 


745 


GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKVVTD 791 




Score 


~ 235 


(35.3 bits), Expect - 7.9e-34, Sum P(2) = 7.9e-34 




Identities = 


* 58/165 (35%), Positives - 83/165 (50%) 




Query: 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 






G MSG +PLH+AA G A+++ LLL AN N PLHL Q+GH V 




Sbjct: 


625 


GGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEGHVPVA 


684 



644 
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Query: 794 KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 853 

L+ + G TPL A G+ +LV LLQH A +NA G + LH+A + 

Sbjct: 685 DVLIKHGVMVDATTP^GYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 744 

Query: 854 KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS-- KIMELLQVV 896 

H +V LLL +GAS ++ T + A++ + ++L+VV 

Sbjct: 745 GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKVV 789 

Score = 233 (35.0 bits), Expect - 7.9e-34, Sura P(2) - 7.9e-34 
Identities = 67/202 (33%), Positives = 100/202 (49%) 



Query: 


404 


HIAS-GNQKEVERLLSQEDHDKDTVQKMCH--PLCFCDDC-EKLVSGRLNDPSWTPFSR 


459 






H+A+ G+ + RLL QD+D++HPL C V+LD PSR 




Sbjct: 


310 


HMAAQGDHLDCVRLLLQYDAEIDDIT-LDHLTPLHVAAKCGHHRVAKVLLDKGA-KPNSR 


367 


Query: 


460 


DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 






G TPLH+A +++LL+ GA ++A G TPLH+A G+ + LL 




Sbjct: 


368 


ALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLPIVKNLLQRG 


427 


Query: 


520 


ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 


579 






AS V + TPLH+A GH + K L+ +++ + TPLH AAR G+ 




Sbjct: 


428 


ASPNVSNVKVETPLHMAARAGHTEVAKYLLQ NKAKVNAKAKDDQTPLHCAARIGHTN 


484 


Query: 


580 


VIETLLQNGASTEIQNRLKETPLKCA 605 








+++ LL+N A+ + TPL A 




Sbjct: 


485 


MVKLLLENNANPNLATTAGHTPLHIA 510 




Score 


- 226 


(33.9 bits), Expect - 7.0e-33, Sum P(2) = 7.0e-33 




Identities « 53/153 (34%), Positives « 83/153 (54%) 




Query: 


743 


DGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNAK 


802 










Sbjct: 


601 


NG YT PLH I AAKQNQV E VARS L LQYGG S AN AE S VQGVT PLH L AAQEGH AEMVAL LLS KQ AN 


660 


Query: 


803 


PNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVELL 


862 






N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 




Sbjct; 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 


720 


Query: 


863 


LLHGAS VQVLNKRQRTAVDCAEQ- -NSKIMELL 893 








LHAV K ++AQ++I+LL 




Sbjct: 


721 


LQHQAD VN A KT KLG Y S PLHQAAQQGHT D I VT LL 753 




Score 


= 198 


(29.7 bits), Expect = 2.5e-ll, Sum P(2) - 2.5e-ll 




Identities = 


= 51/157 (32%), Positives - 82/157 (52%) 




Query: 


737 


VN VT SQ DG S S P L H V AALH GRADL I RLLLKHG AN AGARN ADQ A V PLH LAC QQGH FQVVKC L 


796 






+ T++ G++ LH+AAL G+ +++R L+ +GAN A++ PL++A Q+ H +VVK L 




Sbjct: 


71 


LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFL 


130 


Query: 


797 


LDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 


856 






L++ A N G TPL A GH +VA L+ +G ALH A 




Sbjct: 


131 


LE NG ANQN VAT E DG FT PLA V ALQQG HEN VV AH L I N YGT K GKVRLPALHIAARNDDT 


186 


Query: 


857 


FWELLLLHGASVQVLNKRQRTAVDCAE— QNSKIMELL 893 








+LL + + VL+K T + A +N + +LL 




Sbjct: 


187 


RTAAVLLQNDPNPDVLSKTGFTPLHIAAHYENLNVAQLL 225 




Score 


= 186 


(27.9 bits), Expect - 6.6e-29, Sum P(2) = 6.6e-29 




Identities = 


= 55/143 (38%), Positives = 68/143 (47%) 




Query: 


463 


GHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASA 


522 






GHTPLH+AA G + L+ K A G TPLH+A + G V LLL A 




Sbjct: 


503 


GHTPLHIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHP 


562 


Query: 


523 


EVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIE 


582 






NG TPLH+A + + D VK L+ S N G TPLHIAA+ V 




Sbjct: 


563 


NAAGKNGLTPLHVAVHHNNLDIVKLLLPRG-GSPHSPAWN — GYTPLHI AAKQNQVEVAR 


619 


Query: 


583 


TLLQNGASTEIQNRLKETPLKCA 605 








+LLQ G S ++ TPL A 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLA 642 




Score 


- 182 


(27.3 bits), Expect - 2.9e-28, Sum P(2) = 2.9e-28 




Identities - 54/185 (29%), Positives => 89/185 (48%) 




Query: 


738 


NVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLL 


797 






N+ ++ G +PLH+ A G + +L+KHG A PLH+A G+ ++VK LL 




Sbjct: 


662 


NLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLL 


721 


Query: 


798 


DSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVF 


857 






A N K G +PL A GH ++V LLL++GAS N ++ G T L A ++ 
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Sbjct: 722 QHQADVNAKTKLGYSPLHQAAQCSHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 781 

Query: 858 VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAETDRKEYVTV 917 

V ++L + V++ V+S PV+ DV+E + +E ++ 

SbjCt: 782 VTDVLKV VTDETSFVLVSDKHRMS FPETVDEILDVSEDEGEELISF 827 

Query: 918 KIRKK 922 
K ++ 

SbjCt: 828 KAERR 832 

Score = 180 (27.0 bits), Expect = 5.0e-29, Sum P{2) * 5.0e-29 
Identities = 41/121 (33%), Positives = 67/121 (55%) 



Query: 


486 GAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCV 


545 




G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 




Sbjct: 


35 GVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVV 


94 


Query: 


546 


KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 


605 






+ LV Y ++ ++KG TPL++AA+ + V++ LL+NGA+ + TPL A 




Sbjct: 


95 


RELVNY GANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVA 


151 


Query: 


606 


L 606 
L 




Sbjct: 


152 


L 152 




Score 


= 166 


(24.9 bits), Expect - 3.4e-06, Sum P(2) - 3.4e-06 




Identities ' 


89/318 (27%), Positives = 140/318 (44%) 




Query: 


448 


LNDPSWTPFSRDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKG 


507 




L + + V ++DD+ TPLH AA G +++ LL+ AN G TPLH+A ++G 




Sbjct: 


457 


LQNKAKVNAKAKDDQ— TPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREG 


514 


Query: 


508 


YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 


552 






+ L LL +AS G TPLH+A YG + L+ D 




Sbjct: 


515 


HVETVIALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLH 


574 


Query: 


553 


— VESCRLDI GNE KGDTPLHI AARWGYQGVIETLLQNGASTEIQNRL 


597 






V LDI G+ G TPLHIAA+ V +LLQ G S + + 




Sbjct: 


575 


V A VHHNN L D I VKLL L P RGG SPHSPAWNGYTPLHIAA KQNQ VEV A RS L LQ YGG S AN A ES VQ 


634 


Query: 


598 


KETPLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSM-SA 


656 






TPL A M A LS +Q + +S + ++QE + 




Sbjct: 


635 


GVTPLHLAAQEGHAE-MVALLLS-— KQANGNLGNKSGLTPLHLVAQEGHVPVADVLIKH 


690 


Query: 


657 


GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 716 






G + T + LA G++++V++LL+ + D+ +A+ + + PL Q 




Sbjct: 


691 


G VMVDATTR — MGYT PLHVASHYGN I KLVKFLLQH-QADV-NAKTKLG YS PLHQ 


740 


Query: 


717 


CPKCAPAQKRLAKVPASGLGVNVTSQDGSSPLHVA 751 








+ + + +G N S DG++PL +A 




Sbjct: 


741 


AAQQGHT DI - VTLLLKNG AS PNEVSS DGTT PLA I A 774 




Score 


= 162 


(24.3 bits), Expect => 1.8e-07, Sum P(2) = 1.8e-07 





Identities = 48/149 (32%), Positives « 71/149 (47%) 

Query: 737 VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 796 

V D ++ AA G 0 L++G + N + LHLA ++GH ++V L 

SbjCt: 5 VGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 64 

Query: 7 97 LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 856 

L GNT LA G E+V L+ +GA++NA + KG T L+ A E H+ 

Sbjct: 65 LHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHL 124 

Query: 857 FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 

VV+ LL +GA+ V + T + A Q 
Sbjct: 125 EVVKFLLENGANQNVATEDGFTPLAVALQ 153 

Score - 158 (23.7 bits), Expect - 5.7e-26, Sum P(2) = 5.7e-26 
Identities = 38/135 (28%), Positives - 65/135 (48%) 

Query: 4 60 DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 519 

+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 
Sbjct: 42 NQNGLNGLHLASKEGHVKMVVELLHKEI I LETTTKKGNT ALHI AALAGQDEVVRELVNYG 101 

Query: 520 ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 579 

A+ Q G TPL++A H + VK L+ ++ EG TPL +A + G++ 

SbjCt: 102 ANVNAQSQKGFTPLYMAAQENHLEVVKFLLE— NGANQNVATEDGFTPLAVALQQGHEN 158 

Query: 580 VIETLLQNGASTEIQ 594 

V+ L+ G +++ 
Sbjct: 159 WAHLINYGTKGKVR 173 
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Score - 115 (17.3 bits), Expect = 1.8e-21, Sum P(2) 
Identities = 37/119 (31%), Positives = 58/119 (48%) 



1.8e-21 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



497 ATPLHLACQKGYQSVTLLLLHYKASAEVQ— DNNGNTPLHLACTYGHEDCVKALVYYDVE 554 
AT A + G ++ L H + ++ + NG LHLA GH V L++ ++ 
13 ATSFLRAARSG — NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEI I 70 

555 SCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 
L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

71 LETTTKKGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENHLEVV 127 

615 E 615 
128 K 128 



Score = 106 (15.9 bits), Expect = 1.8e-0l, Sum P(2) - 1.6e-01 
Identities = 34/121 (28%), Positives = 54/121 (44%) 

Query: 7 69 NAGARNADQAVPLHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 

+GRADA A+G+ L+ N++G LA GH ++V 

Sbjct: 4 SVGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

Query: 829 LLQHGASINASNNKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 

LL + + KGNTALH A + VV L+ +GA+V +++ T + A Q + 

Sbjct: 64 LLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENH 123 

Query: 8B9 I 889 
+ 

Sbjct: 124 L 124 

Score = 40 (6.0 bits), Expect » 1.6e-14, Sum P ( 2 ) = 1.6e-14 
Identities = 11/56 (19%), Positives « 23/56 (41%) 

Query: 622 ERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ+++Q++ Q++ +K++R V 

Sbjct: 1614 DRRQQGQE EQVQE AKNT FT Q VVQGN E FQN I PG EQVT EEQFTDEQGNIVTKKIIRKV 1669 

Score « 38 (5.7 bits), Expect = 2.6e-14, Sum P(2) = 2.6e-14 
Identities = 6/12 (50%), Positives * 10/12 (83%) 

Query: 806 KDLSGNTPLIYA 817 

+D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 

Pedant information for DKF2phtes3_1817, frame 2 



Report for DKFZphtes3_lB17 . 2 



[LENGTH] 
[KW] 

Epi] 

[HOMOL] 

complete 

[FUNCAT] 

[FUNCAT] 

3e-12 

[ FUNCAT] 

[ 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

3e-08 

[FUNCAT] 

[FUNCAT) 

5e-05 

[FUNCAT] 

[FUNCAT] 

5e-05 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[SCOP] 

[EC] 

[PIRKW] 

[PIRKW] 



cds. 



1050 

117013.72 
6.47 

TREMBL : DMANKY_1 product: "ankyrin"; Drosophila melanogaster ankyrin mRNA, 
2e-45 

08.19 cellular import [S. cerevisiae, YOR034c] 5e-13 
' 10.05.99 other pheromone response activities [S. cerevisiae, YDR264c] 



03.07 pheromone response, mating- type determination, sex-specific proteins 
S. cerevisiae, YDR264c] 3e-12 

99 unclassified proteins [S. cerevisiae, YILll2w] 2e-ll 

06.13.01 cytoplasmic degradation [S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e-08 
04.05.01.07 chromatin modification [S. cerevisiae, YIR033w] 2e-08 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YGR233c] 



08.13 vacuolar transport [S. cerevisiae, YML097c] 5e-05 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YML097c] 

30.03 organization of cytoplasm [S. cerevisiae, YML097c] 5e-05 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

03.22 cell cycle control and mitosis [S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllc] 3e-04 
BL00901A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1.91.3.1.2 GA binding protein (-GABP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus ■ le-13 
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PIRKW] 


pOLasoiuw Cuarinej. 3c ID 


PIRKW] 




PIRKW] 


Hiinnr Qiinnroeonr 1 a.nQ 


PIRKW) 


UUpiivaLiuii 1c 1*1 


PIRKW] 




PIRKW] 


he te rod inter le-14 


PIRKW J 


potassium transport 5e-15 


PIRKW] 


cell cycle control le-10 


PIRKW] 


Qpri np/thrponi np-snpfi fi r nrotein kinase le— 19 


PIRKW] 


frAn<!inPinhi'anp nrofpin Se — 15 


PTRKUl 
trlt\J\H J 


iCaiispoiw ptOLein -)t= ±j 


PTPKU1 




DTD VW 1 


oncogene le— 08 


DT DVU 1 


Air ie 1 j 


DTD If U 1 


proLSin Kinase inuionut ic u? 


DTD VXJ 1 

f 1 KKW J 


voltage— ga ted ion channel 5e~ 15 


DTDVU 1 

rltuvn J 


phosphoprotein 4e— 38 


D TDVU 1 


apoptosis le~ 19 


DTD VU 1 

rlKKW J 


liver le — v:? 


WTO VTJ 1 

PIRKW J 


integrin binding 3e-16 


DTD I/fJ 1 

P I RKW J 


differentiation 2e-12 


PIRKW ] 


transforming protein le~08 


DTD VTJ 1 

ciKRW J 


alternative splicing le—40 


DTD XTTJ 1 

P I RKW J 


coiled coil le-14 


DT DVU 1 

r 1 KKW J 




DT DVU 1 


transcription factor 4e—16 


D T DVU 1 
PI RKW J 


transcription regulation 2e~ 16 


DTD VM 1 
PI RKW J 


nucleotide binding 5e-15 


PI RKW J 


phosphoric monoester hydrolase le— 12 


PIRKW J 


cytoskeleton 8e-39 


PIRKW] 


caimouUJ.in Dinaing ie-iy 


PIRKW] 


smooth muscle le-12 


SUPF AMJ 


ankyrin le-40 


SUPFAM] 


death-associated protein kinase le-19 


SUPF AMJ 


ankyrin repeat homology le~40 


SUPFAM] 


protein kinase homology le-19 


SUPFAM] 


vaccinia virus 27. 4K Hindlll-C protein homology 3e-07 


SUPFAM] 


int— 3 transforming protein le-08 


SUPFAM] 


unassigned ankyrin repeat proteins 2e-38 


SUPFAM] 


notch protein 2e-12 


SUPFAM] 


fowlpox virus BamHI-0RF7 protein 2e-13 


SUPFAM] 


rel homology 2e-ll 


SUPFAM] 


EGF homology 2e-12 


PROSITE] 


ATP_GTP_A 1 


PFAM] 


Ank repeat 


KW] 


Irregular 


KW] 


3D 


KW] 


L0W_C0MPLEXITY 3.05 % 



SEQ MALYDEDLLKNPFYLALQKCRPDLCSKVAQIHGIVLVPCKGSLSSSIQSTCQFESYILIP 

SEG 

lawcB 

SEQ VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPILFEETFYNEKEESFSILCIAHPLEKR 

SEG 

lawcB 

SEQ ESSEEPLAPSDPFSLKTIEDVREFLGRHSERFDRNIASFHRTFRECERKSLRHHIDSANA 

SEG 

lawcB 

SEQ LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVEIYVHHEIYNLIFKYVGTMEASEDAAFN 

SEG 

lawcB 

SEQ KITRSLQDLQQKDIGVKPEFSFNIPRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 

SEG 

lawcB 

SEQ RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 

SEG xxxxxxxxxx 

lawcB 

SEQ YIRQGSLSAKPPESEGFGDRLFLKQRMSLLSQMTSSPTDCLFKHIASGNQKEVERLLSQE 

SEG 

lawcB 

SEQ DHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDDRGHTPLHVAAVCGQASLID 

SEG 

lawcB 
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SEQ LLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYG 

SEG '. 

lawcB 

SEQ HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 

SEG 

lawcB 



SEQ PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQ . 

SEG xxxxxxxxxxxxxxxxxxxxxx. 

lawcB 

SEQ EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

SEG 

lawcB 

SEQ APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

SEG 

lawcB CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTTTTCCH 

SEQ LHLACQQGHFQWKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 

SEG 

lawcB HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHHHHHHHHCCCTTTTEE 

SEQ NKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQWPSCV 



SEG 

lawcB TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHC 

SEQ ASLDDVAETDRKEYVTVKIRKKWNSKLYDLPDEPFTRQFYFVHSAGQFKGKTSREIMARD 

SEG 

lawcB 

SEQ RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

SEG 

lawcB 

SEQ RHTVEDAVVSQGPEAAGPLSTPQEVSASRS 

SEG 

lawcB 



Prosite for DKFZphtes3_1817 . 2 
PS00017 945->953 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_1817 . 2 



HMMJJAME Ank repeat 

HMM *GyTPLHIAARyNNv£MVrlLLQHGADIN* 

G+TPLH+AA ++ +.+++LL+++GA +N 
Query 463 GHTPLHVAAVCGQASLI DLLVSKGAMVN 490 

32.12 (bits) f: 496 t: 523 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus : 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G TPLH+A++ + ++ LLL + A+ 

dkfzphtes3 4 96 GATPLHLACQKGYQSVTLLLLHYKASAE 523 

Query f: 529 t: 556 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 529 GNTPLHLACTYGHEDCVKALVYYDVESC 556 

42.65 (bits) f: 565 t: 592 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+TPLHIAAR + +++ LLQ+GA+ 

dkfzphtes3 565 GDTPLHIAARWGVQGVIETLLQNGASTE 592 

Query f: 744 t: 771 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G +PLH+AA +++ +++RLLL+HGA+ 
Query 744 GSSPLH VAALHGRADLI RLLLKHGANAG 771 
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36.38 (bits) f: 777 t: 804 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
PLH+A+++++ ++V+ LL+ +A +N 

dkfzphtes3 777 QAVPLHLACQQGHFQVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GN T PL I YACSGGHHEL VALLLQHGAS I N 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein YFL046w. 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to YFL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map=*"405.0/.3 cR from top of Chrll linkage group" 
Insert length: 1395 bp 

Poly A stretch at pos. 1367, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



GGGACCACGG 
ACTACAATTC 
CTGTGCGAGA 
AGGACATGAA 
GGAGATCGTT 
GAGAGAGTTC 
TGGATATAAC 
TTGGTTCAGG 
AATTGTATCA 
ATAAAGAGAT 
ATGGCTCATT 
TGAATTTGCA 
ACCAAGTTAA 
AATAAACTGG 
AGATCAAGAA 
ATACTCAAAC 
GAAATTGCTT 
TCGTTATCTT 
TT TAT AG ATT 
CTTCTTAGAA 
CAGCAAAAAT 
AGGAAGTGTT 
TGAGTGCAGG 
GGTTTGAGAT 
ACCAGAGTTT 
TTTGTGGGTA 
TATTACTGAA 
TAAAATGAAA 



TGGCGCCTGC 
CCAGCATTCC 
TTTACCCCGT 
TAGTCGCCAG 
GGGTTTCAAG 
TTCACTACCA 
TCCTTTAGAA 
ACTTGGAAAC 
GCGTTAACTG 
GGTCACTCAA 
TGGATGCTAT 
AATCTGAGAG 
GCAACAACTA 
ATATCAACTT 
AAGCAACTTA 
CAAAAGTATT 
CCTTAAAAAC 
GCAGCTTCGG 
CTGGAAGTAG 
CACCAAACCG 
TTACTACACA 
TTAGAATGAG 
TGTGTGTCTT 
AGAAGAGCAT 
TCTTATGTTT 
TTGGTTGAAT 
TTAACTTTCC 
G ATT ATT AC A 



GCTGGGAGGT 
TGTGGTGCCA 
CTTCCGCCTC 
GCTTGGCGGC 
GCCCCGCGGG 
CAACCAAGGA 
CAAAGGAAAT 
TCATGGATTT 
CTTTATCAAA 
GCTCAACAGG 
CAGGAAAGAC 
CAGAGAATGA 
ATGCATGAAA 
AGAAAGGAGC 
TGGAAACAAC 
ATTTCAGAGA 
ACTGATGGAA 
TGTTTACTTG 
TATTAATGCT 
GGAGAGATTT 
AGATTATTCG 
AAGAGATACT 
TATTATATTG 
TTTGTCCTTT 
GCTTGAACAG 
ATTTGTAAAC 
TGATAACCAT 
ACAAAAAAAA 



GAGCTTGTGA 
GAACTACCTT 
CCTCCCACCG 
TCTTTCTCTC 
CATTTCTCGC 
GGGATATGAT 
TAACTTTTGA 
GACAAAACAC 
TGTCAGCCTG 
AAATAACAGT 
ATGGTCATCC 
GAAAATGAAA 
CCAGTCGAAT 
AGAGTAACAG 
TACAGAATTT 
CCAGTAATAA 
TCTAACAAAC 
CCTGGCAATA 
CATCCTGCTG 
ACTTTGAACA 
AAGTGTATAC 
GTGTCTTTAT 
AAAAGCTGTC 
TGATAGTTAA 
TTGTGTAAAT 
CATTCCCTAG 
TGCATAATTA 
AAAAAAAAAA 



CAGAGCGAAA 
GCCCGAAAGC 
GAAAACTCTG 
CCAAGGCAGA 
CGGCCCTGCG 
AGGCGGCCAG 
TACCCATGCA 
AAGCAGAAAC 
GATACTATCT 
ACAACAGCTA 
TAGAGAAAAG 
ATTGAATTAG 
CAGAGCAGAT 
ATATGTTTAC 
ACAAAAAAGG 
AATTGACGCT 
TTGAGACAAT 
GCATTGGGAT 
TGGCTGTTGG 
TTGTCAGTTG 
GGACTAAAAG 
TGTGTGTGTG 
ACTCAGACCT 
TAGAAATTGA 
CATACAGGAT 
CCTACATATT 
CATTTTTCTA 
AAAAA 



BLAST Results 



Entry HS41934 6 from database EMBL: 
human STS WI-13569. 
Score = 2154, P = 8.6e-91, identities = 446/459 

Entry HS1292427 from database EMBL: 
human STS SHGC-50338. 
Score = 1737, P = 7.2e-72, identities = 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score - 1578, P - 1.0e-64, identities = 358/397 



Medline entries 



No Medline entry 



651 



WO 01/12659 



PCT/IBOO/01496 



Peptide information for frame 3 



ORE" from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 



1 MNSRQAWRLF LSQGRGDRWV SRPRGHFSPA LRREFFTTTT KEGYDRRPVD 

51 ITPLEQRKLT FDTHALVQDL ETHGFDKTQA ETIVSALTAL SNVSLDTIYK 

101 EMVTQAQQEI TVQQLMAHLD AIRKDMVILE KSEFANLRAE NEKMKIELDQ 

151 VKQQLMHETS RIRADNKLDI NLERSRVTDM FTDQEKQLME TTTEFTKKDT 

201 QTKSIISETS NKIDAEIASL KTLMESNKLE TIRYLAASVF TCLAIALGFY 

251 RFWK 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19f 19, frame 3 

SWISSPROT:YAN8 SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME 
I., N = 1, Score = 144, P = 8.4e-09 

PIR:S56209 probable membrane protein YFL04 6w - yeast (Saccharomyces 
cerevisiae), N » 1, Score - 138, P =* 5.4e-08 

>SWISSPROT:YAN8 SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length - 211 

HSPs: 

Score = 144 (21.6 bits), Expect = 8.4e-09, P = 8.4e-09 
Identities = 34/121 (28%), Positives - 67/121 (55%) 

Query: 70 LETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQE-ITVQQLMAHLDAIRKDMVI 128 

LE G+ AETI + + ++ +L + K + +A+QE ++ QQ L IRK + 
Sbjct: 46 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 IEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 

Pedant information for DKFZphtes3_19f 19, frame 3 
Report for DKFZphtes3_19f 19 . 3 



[LENGTH] 

[MW] 

tpD 

[HOMOLJ 

2e-l0 

[FUNCATJ 

[PROSITE] 

[KW] 

[KW] 

[KW] 



254 . 

29505.73 
6.99 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 



99 unclassified proteins 
RGD 1 
TRANSMEMBRANE 1 
LOW_COMPLEXITY 5.12 % 

COILED COIL 11.02 % 



[S. cerevisiae, YFL046w] 8e-12 



SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVDITPLEQRKLT 

SEG 

PRO ccchhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTHALVQDLETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQEITVQQLMAHLD 

SEG 

PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 
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MEM 

SEQ AIRKDMVILEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ FTDQEKQLMETTTEFTKKDTQTKSIISETSNKIDAEIASLKTLMESNKLETIRYLAASVF 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM MMMMMMM 

SEQ TCLAIALGFYRFWK 

SEG 

PRD hhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMM .... 



Prosite for DKFZphtes3_19f 19.3 
PS00016 15->18 RGD PDOC00016 



(No Pfam data available for DKFZphtes3_19f 19 . 3) 
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DKFZphtes3_19jl7 



group: testes derived 

DKFZphtes3 19jl7 encodes a novel 4 36 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rsp5/WWP domain signatures. 

The WW domain (or rsp5 or WWP domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophin, vertebrate YAP 
protein, mouse NEDD-4 and yeast RSP5. 

The domain is repeated up to 4 times in some proteins. It has been shown to bind proteins with 
particular proline-motifs, [AP] -P-P- [APJ -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to C.elegans Y40B1A.2 

there are two long ORFs in this cDNA according to EST: 
HS12146/HS75086/AA923755/MMAA17335 remaining intron at Bp 1506-1733 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 27 62 bp 

Poly A stretch at pos . 2740, no polyadenylation signal found 



1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATAAAA TGTTGCGGAG 
401 ATCTGATAGT CCTGAAAACA AATACAGTGA CAGCACAGGT CACAGTAAGG 
451 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTCTTC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCAATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAGATGGCAG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
1201 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 
1251 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 
1351 ATTCTAATGT GGACATATCT AAAATAAATG AAGTTCTTAC AGCAGCTGTG 
1401 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGAAGTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 
1751 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
1801 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 
1851 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACAGAAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG CCTGCACTAG 

2151 ACGTTCAAGG ATGGCCTGCA 

2201 CGCGAAGAAG CGCATAACAT 

2251 TGAATTAAAA AATTTAAGAT 

2301 CTTTGCGAGA GCAAAGGATA 

2351 GAAAAGCTAA AAAATCAGAA 

2401 GCACATGGTT TTGAGAACAG 

2451 TTTTTGAGCT GCATTTAAGT 

2501 AATGACAAGG GGACGGGGTC 

2551 GATTGATTTG TAAAACCCTT 

2601 CACGTTGTAA ATATGTTTTG 

2651 CAGAGCTTAG ACATCCAAAA 

2701 GCCTTTTACA TGTAAACCTG 

2751 AAAAAAAAAA AA 



CAGCACACTT CAGTGAAAAT CTCATAAAAC 
GATCATGCAG AGAAGCAGGC ATCAAGATTA 
GGGAACTATT CACATGTCCG AAATTTGTAC 
CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 
CTATTTTTGA GACAACAAAT TAAGGAACTT 
TTCCTTCATG GTGTGAAGAT GTGAATAATT 
GAACTGTAAA TCTGTTGCCC AATCTTAACA 
AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 
TGTGAGAGTC AATTCAGGGG AAAGATACAA 
GAAATGTAGA TTTCTTGTAG ATGTATCCTT 
TAGAGTGAAG CCATGGGAAG CCATGTGTAA 
CTAATCAATG CTGAGGTGGC TAAATACCTA 
TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 



BLAST Results 



Entry AC005876 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1188I5 map lOpll . 2-10pl2 . 1, 
complete sequence. 

Score = 2130, P = 0.0e+00, identities = 426/426 
12 exons matching Bp 4 92-2740 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 



1 MSLTSDASSP RSYVSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPWKQ 
51 GPVSQSATQQ PVTADBCQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSNAS 
101 NATVVPQNSS ARSTCSLTPA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: ww_DOMAIN_l (90-116) 
WW_DOMAIN_l (90-116) 



1 MRDAGDPSPP NKMLRRSDSP ENKYSDSTGH SKAKNVHTHR VRERDGGTSY 
51 SPQENSHNHS ALHSSNSHSS NPSNNPSKTS DAPYDSADDW SEHISSSGKK 
101 YYYNCRTEVS QWEKPKEWLE REQRQKEANK MAVNSFPKDR DYRREVMQAT 
151 ATSGFASGME DKHSSDASSL LPQNILSQTS RHNDRDYRLP RAETHSSSTP 
201 VQHPIKPVVH PTATPSTVPS SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 
251 SSVPAQKTER KESTSGDKPV SHSCTTPSTS SASGLNPTSA PPTSASAVPV 
301 SPVPQSPIPP LLQDPNLLRQ LLPALQATLQ LNNSNVDISK INEVLTAAVT 
351 QASLQSIIHK FLTAGPSAFN ITSLISQAAQ LSTQDIPLHE GIQMERDTHR 
401 SKWEVKGSLC QKADKQQECL VWNGSIMVQR LLQPSG 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19jl7, frame 3 

TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid 
Y40B1A, N = 1, Score = 144, P = 1.8e-09 



>TREMBL:CEY40B1A_2 gene: '^OBIA^"; Caenorhabditis elegans cosmid Y40B1A 
Length = 120 

HSPs: 

Score = 144 (21.6 bits), Expect - 1.8e-09, P - 1.8e-09 
Identities » 30/67 (44%), Positives = 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 146 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 



Pedant information for DKFZphtes3_19jl7, frame 2 



Report for DKFZphtes3_19jl7 .2 



[LENGTH] 209 

[MW] 22873.85 

[pi] 9.95 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 13.40 % 



SEQ MSLTSDASSPRSYVSPRISTPQTNTVPIKPLISTPPVSSQPKVSTPVVKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATVVPQNSSARSTCSLTPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

SEQ LAAHFSENLIKHVQGWPADHAEKQASRLREEAHNMGTIHMSEICTELKNLRSLVRVCEIQ 

SEG 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRILFLRQQIKELEKLKNQNSFMV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_19 j 17 . 2) 
(No Pfam data available for DKFZphtes3_19jl7 .2) 



Pedant information for DKFZphtes3_19j 17, frame 3 



Report for DKFZphtes3_19j 17 . 3 



(LENGTH] 436 

[MW] 47716.62 

Ipl] 8.71 

[HOMOL] TREMBL : CEY4 0B 1 A__2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 6e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YPR152c] 6e-04 

[BLOCKS] BL01159 WW/rsp5/WWP domain proteins 

[PROSITE] WW_DOMAIN_l 2 

[PFAM] WW/rsp5/WWP domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSKAKNVHTHRVRERDGGTSYSPQENSHNHS 

SEG xxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYNCRTEVSQWEKPKEWLE 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

SEQ REQRQKEANKMAVNSFPKDRDYRREVMQATATSGFASGMEDKHSSDASSLLPQNILSQTS 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhhhhcccccccccccccccccccccccccccccc 

SEQ RHNDRDYRLPRAETHSSSTPVQHPIKPVVHPTATPSTVPSSPFTLQSDHQPKKSFDANGA 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLNPTSAPPTSASAVPV 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 

SEQ FLTAGPSAFNITSLISQAAQLSTQDIPLHEGIQMERDTHRSKWEVKGSLCQKADKQQECL 

SEG 

PRD hhcccccceeehhhhhhhhhhhccccccccccccccccccceeeecccchhhhhhhccee 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 



Prosite for DKFZphtes3_19j 17 . 3 

PS 01159 90->116 WW_DOMAIN_l PDOC50020 

PS01159 90->116 WW DOMAIN 1 PDOC50020 



Pfam for DKFZphtes3 19 j 17. 3 



HMM_NAME WW/rsp5/WWP domain containing proteins 

HMM *LPsGWEeHWDpsGRpWYYWNHETkTTQWEpP* 

+ ++W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHISSSGKK-YYYNCRTEVSQWEKP 115 
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DKFZphtes3_lcl 



group: signal transduction 

DKFZphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila rotund transcript and human n-chimaerin. 

rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4 , 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find application in modulating/blocking the response to a cellular 
receptor. 

similarity to GTPase-activating proteins 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3237 bp 

Poly A stretch at pos. 3227, no polyadenylation signal found 

1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 
51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 

101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 

151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TT AT AC TACT 

201 GGTAGATAGA AGAGCTAAAG AAAGATGGAT ACTATGATGC TGAATGTGCG 

251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 

301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 

351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 

401 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA TGTTAAGCTG AAGCATGCAC 

451 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 

501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 

551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 

601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 

651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 

701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 

751 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 

801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 

851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 

901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 

951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 
1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 
1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 
1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 
1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 
1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 
1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 
1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATTGAGCAAA 
1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 
1401 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 
1451 CAGCAAAGTG GATGATATCC ATGCTATCTG TAGCCTTCTA AAAGACTTTC 
1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 
1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 
1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 
1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 
1701 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 
1751 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 
1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 
1851 ATGATGGTGG AGCAAGAGAA CATTGACCCC CTACATGTCA TTGAAAACTC 
1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 
1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 
2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 
2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 
2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 
2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 
2201 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 
2251 TGTGCATGAG GGTTTTATTA AAACTATATA TATCTCCCCT TCCTTCTCCT 
2301 CAAGTCACAT AATATCAGCA CTTTGTGCTG GTCATTGTTG GGAGCTTTTA 
2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 
2401 TTCTTTTTGG GG AAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 
2451 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 
2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 
2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2601 GCCTCCTTCT AAGAACCAGA AGTTCTCATT CCCCATTATG AACTGAGCTA 
2651 TAATATGGAG CTTTCATAAA AATGGGATGC ATTGAGGACA GAACTAGTGA 
2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 
2751 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC AACTCGTATT TATCTCTGAT 
2801 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 
2851 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 
2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 
2951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 
3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 
3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 
3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 
3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 
3201 AAAAAAGTAA ATAGCTTTTT CAAAATGAAA AAAAAAA 



BLAST Results 



Entry U82984 from database EMBLEST: 

Homo sapiens DRES 56 raRNA sequence. 

Score = 8775, P = 0.0e+00, identities =* 1757/1758 

matches 3* end 



Medline entries 



93074974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP:cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
imaginal disc 

morphogenesis encodes a protein which is similar to human Rac 
GTPase-activating 

(racGAP) proteins. 



Peptide information for frame 3 



ORF from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 



1 MDTMMLNVRN LFEQLVRRVE ILSEGNEVQF IQLAKDFEDF RKKWQRTDHE 
51 LGKYKDLLMK AETERSALDV KLKHARNQVD VEIKRRQRAE ADCEKLERQI 
101 QLIREMLMCD TSGSIQLSEE QKSALAFLNR GQPSSSNAGN KRLSTIDESG 
151 SILSDISFDK TDESLDWDSS LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 
201 TRSIGSAVDQ GNESIVAKTT VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 
251 TLQPWNSDST LNSRQLEPRT ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 
301 VPCGKRIKFG KLSLKCRDCR VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 
351 LADFVSQTSP MIPSIVVHCV NEIEQRGLTE TGLYRISGCD RTVKELKEKF 
401 LRVKTVPLLS KVDDIHAICS LLKDFLRNLK EPLLTFRLNR AFMEAAEITD 
451 EDKSIAAMYQ AVGELPQANR DTLAFLMIHL QRVAQSPHTK MDVANLAKVF 
501 GPTIVAHAVP NPDPVTMLQD I KRQPKVVER LLSLPLEYWS QFMMVEQENI 
551 DPLHVIENSN AFSTPQTPDI KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 
601 TLTKNTPRFG SKSKSATNLG RQGNFFASPM LK 



BLASTP hits 



Entry CEK08E3_4 from database TREMBLNEW: 

gene: "K08E3.6"; Caenorhabditis elegans cosmid K08E3 

Score = 452, P = 2.6e-48, identities = 126/377, positives « 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel. 7 - fruit 
fly (Drosophila melanogaster) (fragment) 

Score « 480, P • 9.2e-46, identities « 111/270, positives = 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score - 480, P = 9.2e-46, identities ~ 111/270, positives » 155/270 
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Entry DM22539_1 from database TREMBL : 

gene: "rotund**; product: M rnracGAP w ; Drosophila melanogaster rnracGAP 
(rotund) gene, complete cds. 

Score = 480, P = 9.2e-46, identities = 111/270, positives - 155/270 

Entry S29128 from database PIR: 
N-chimerin - rat 

Score - 336, P - 8.8e-30, identities = 86/253, positives = 128/253 



Alert BLAST P hits for DKFZphtes3_lcl, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lcl, frame 3 



Report for DKFZphtes3_lcl . 3 



[LENGTH) 

[MWJ 

[PU 

[HOMOL] 

fruit fly 

(FUNCATJ 

[FUNCAT] 

[FUNCAT] 

[FONCAT] 

2e-ll 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

(S 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[KW] 

[KW] 
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71026.84 
9.08 

PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
(Drosophila melanogaster) 2e-46 

10.99 other signal-transduction activities [S. cerevisiae, YBR260c] 3e-12 
03.22 cell cycle control and mitosis [S. cerevisiae, YER155C] 2e-ll 

30.03 organization of cytoplasm [S. cerevisiae, YER155c] 2e-ll 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YER155c] 

03.10 sporulation and germination [S. cerevisiae, YDL240w] 3e-09 
30.04 organization of cytoskeleton [S. cerevisiae, YOR134w] 4e-09 
06.10 assembly of protein complexes [S. cerevisiae, YORl34w] 4e-09 
03.07 pheromone response, mating-type determination, sex-specific proteins 
. cerevisiae, YOR127w) 5e-09 

09.04 biogenesis of cytoskeleton [S. cerevisiae, YPLllSc] 3e-08 

10.02.09 regulation of g-protein activity [S. cerevisiae, YPLll5c] 3e-08 
BL00479B Phorbol esters / diacylglycerol binding domain proteins 
BL00479A Phorbol esters / diacylglycerol binding domain proteins 
dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn le-55 

dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) le-49 

breakpoint cluster region le-19 
transmembrane protein 7e-08 
brain 3e-22 

alternative splicing le-19 
P-loop 2e-25 
CDC24 homology 3e-22 
bcr protein 3e-22 

myosin motor domain homology 2e-25 

pleckstrin repeat homology 4e-10 

LIM metal -binding repeat homology 2e-09 

protein kinase C zinc-binding repeat homology 5e-29 

MYRISTYL 6 
AMI DAT I ON 1 
CAMP_PHOSPHO_SITE 3 
CK2_PHOSPHO_SITE 13 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 9 
ASN_GLYCOSYLATION 1 
DAG_P E_B I N D I N G_DOMA I N 1 

Phorbol esters / diacylglycerol binding domain 

Irregular 

3D 

LOW COMPLEXITY 2.22 % 

COILED COIL 8.54 % 



SEQ MDTMMLNVRNLFEQLVRRVEILSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG 

COI LS CCCCCCCCCCCC 

Irgp- 

SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLIREMLMCDTSGSIQLSEE 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTIDESGSILSDISFDKTDESLDWDSSLVKTFKLKKR 

SEG 

COILS 
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lrgp- 



SEQ EKRRSTSRQFVDGPPGPVKKTRSIGSAVDQGNESIVAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

lrgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVI KPESC 

SEG 

COILS 

lrgp- 

SEQ VPCGKRIKFGKLSLKCRDCRVVSHPECRDRCPLPCIPTLIGTPVKIGEGMLADFVSQTSP 

SEG 

COILS 

lrgp- 

SEQ MIPSIWHCVNEIEQRGLTETGLYRISGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS 

lrgp- . CCHHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCG-GGCCCCHHHHH 

SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDNSIAAMYQAVGELPQANRDTIAFLMIHL 

SEG 

COILS 

lrgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRV AQS PHT KMDV AN L A K V FG PT I V AH A V P NP D P VTM LQDI K RQP K VVERLL S L PLE YW S 

SEG 

COILS 

lrgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENIDPLHVIENSNAFSTPQTPDIKVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

SEG xxxxxxxxxxx 



COILS 
lrgp- 



SEQ 
SEG 
COILS 
lrgp- 



TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 



XXX 



Prosite for DKFZphtes3_lcl . 3 



PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 



144->148 
206->210 
234->238 
270->274 
323->327 
387->391 
392->396 
410->414 
449->453 
489->493 
579->583 



174->177 
186->189 
245->248 
313->316 
392->395 
435->438 
595->598 
606->609 



212->216 
141->145 
182->186 
246->250 



376- >385 
131->137 
150->156 
276->282 

377- >383 
388->394 
623->629 
303->307 



46->55 



47->51 
66->70 



63->66 



ASN_GLYCOSYLATION 

CAMP PHOSPHO SITE 

CAMP~PHOSPHO~SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH02SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHORS ITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH0"SITE 

CK2_PHOSPH0~SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMIDATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00O08 
PDOC00O08 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
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PS00479 287->336 DAG PE BINDING DOMAIN PDOC00379 



Pfam for DKFZphtes3_lcl . 3 

HMM_NAME Phorbol esters / diacylglycerol binding domain 

HMM *HrFmrHTFrqPTWCDHCgeFIWGWgKQGYQCQnCgMNCHKRCHelVPmm 

H+F+ +T + P +C CG +1 +GK ++C +C+++ H +C+ + P 
Query 287 HDFVSKTVIKPESCVPCGKRI-KFGKLSLKCRDCRVVSHPECRDRCPLP 334 

HMM C* 
C 

Query 335 C 335 
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DKFZphtes3_lgl3 



group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 1007 amino acid protein with similarity to human 256 kD 
golgin. 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cplSl shows 
haploid-specif ic transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 



similarity to 256 kD golgi, strong similarity to rat "cplSl" 
21 exons encoded on AC004682 

EST from a testis library, two mouse BSTs of a testis cDNA library, 
rat cpl51 shows haploid-specif ic transcription! 
testis or haploid-specif ic transcription 

Sequenced by DKF2 

Locus: map="16q22.2" 

Insert length: 3405 bp 

Poly A stretch at pos. 3394, polyadenylation signal at pos. 3373 



1 GGGATAGGGG ATGTGGTTTG 
51 CATTCCTTGA ACTATTCTGC 
101 AGGTTAAAGG AGAATTGAGA 
151 GAGAGAGACA GAGAAGTGAG 
201 ACTTGACATC AAGAATCTGC 
251 TGCAGGACAA TCAGCTCTGC 
301 AAGAAGCAAG CACAGGCATT 
351 GTCCAGTAAA CAGTGTCATC 
401 TGCTGGTCCT TCAACAAGAA 
4 51 TCTTACTATT CTCTCCGCCA 
501 CGACCTGGTT CTTCTGCACC 
551 TTCTCTATGA GGAGGAAATG 
601 CTCCATTTGG CGCAGGAGCA 
651 TCTAGAGAGG AGCTTAAACC 
701 GCAACATCGA GTTACTAGAA 
751 GGCGGGATCA TGGGTCAGGA 
801 AC GG AT AT AC ACTTCTCCTT 
851 AACGACTGTC TGAAGTCTGG 
901 CAAGAACTTC GAAATAAGCT 
951 TGAAAAGGCT TTGATAAAAC 
1001 CCCACAGATA CCCTCCTAGC 
1051 ATACTGAAGC ACTTGCAGGA 
1101 GGAGTACCAG AACCTGGTGA 
1151 CGGAACAGAA GAGAAACATC 
1201 CTGCACGGAC TGCGGGAGGA 
1251 GGACATCACC ATCCTGCAGT 
1301 CCGAGACCCA AAAGCTCACT 
1351 GATGAGATGC TGCAAGAGCT 
1401 CCTCCTGAAA AAGGAGAAGG 
1451 AACTTGAAAT GACAGTCAAG 
1501 GAGTGCAAGG CCCTGCAGGC 
1551 AGAGGCCAAG CAGCAGGAGA 
1601 AAGAAGAGGC TGCACTGGCA 
1651 CTGCAGAAGG GTCTCCTCCT 
1701 ACTACAGAGA GAACTTCAGA 
1751 AGGAACAAAC CTCCAACAGA 
1801 TCTGAAGCCC TGAGGAAGCT 
1851 TCAGAAGACA GTGGCTGAGC 
1901 GTATCAAGCA CCAGCACAGG 
1951 GAAGATCTTC AGGAGGCCAC 
2001 GAAGAAGAGC AAAGAGCATG 
2051 TGCGGCAGGA ATTTAAAAAG 
2101 AAGTTGGAGG AAGAAAATGA 
2151 TACACAACTG GAATCCTCTC 
2201 TCCAAGACTT GAATAAAGAG 
2251 CTGCAGGCCC AGCTGGACAA 
2301 GACTACCATC ACCAAAGAAG 
2351 CCTGCCAGGA TGACCTGACA 
2401 TCAGAGACAA AGAGCCTGCA 



TTACAAAGGA TGAGTATTTT GATAGCTTCT 
AGGTTTATAA CAAAGCTCAG AAAATACTAA 
GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 
CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 
ACGATGTCTG CAAGAGACAG AGGAAGACCT 
ATGGAGGAGG CAATGAACAG CAGCCACGAC 
AGCATTCGAG GAGTCAGAGG TGGAATTTGG 
TGAGACAACT CCAGCAACTG AAGAAAAAAT 
CTGGAGTTTC ACACAGAGGA GTTGCAGACT 
GTATCAGTCC ATCCTAGAGA AGCAGACTTC 
ATCACTGCAA ACTGAAAGAA GATGAGGTGA 
GGAAATCACA ACGAGAACAC AGGGGAGAAG 
ACTCGCCTTG GCCGGGGACA AGATCGCCTC 
TCTACAGGGA TAAATACCAG TCTTCCCTGA 
TGCCAAGTGA AGATGTTGCA GGGGGAACTC 
GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 
GCATGATTCA AGAGCATCAG GAGACTCAGA 
CAAAAGGTCT CTCAACAGGA TGATCTCATT 
GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 
TACAAGCCGA TTTTGCTTCC TGTACAGCCA 
TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 
GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 
AGGATCTGCG CGTGGAACTA GAGGCCGTGT 
ATGAAGGACA TGATGAAGCT GGAGCTGGAC 
GACATCTGCC CACATTGAGA GGAAGGATAA 
GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 
TTGAAGAAAG ACAAGTTCCT CCAAGAGAAA 
GGAGAAGAAA CTGACACAGG TTCAGAACAG 
AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 
GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 
TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 
GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 
GGCTGTCACC TGGAGGACAC CCAGAGGAAA 
GGACAAGCAG AAGGCAGACA CCATCCAGGA 
TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 
AAACGGGTGG AGGAGCTGTC ATTAGAACTC 
TGAAAATTCA GACAAGGAAA AGAGGCAGCT 
AGGATATGAA AATGAATGAC ATGCTTGATC 
GAGCAAGGCT CCATCAAATG CAAGTTAGAA 
AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 
AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 
AAAGACAAGA CGTTGAAAGA GAATTCCAGA 
GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 
TCAACAAATA CAACACCAGC CAGCAAGTCA 
ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 
AGCTCTGCAG AAGGAGAAGC ACTATCTCCA 
CCTATGATGC ATTATCCCGG AAGTCAGCCG 
CAAGCCCTCG AGAAGCTCAA TCACGTGACC 
GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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2451 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 
2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 
2551 GAGGTGCACG CCTTTGACAA GAAGCTAGAG GAGATGAGCT GCCAGGTGCT 
2601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 
2651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 
2701 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 
2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 
2801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 
2851 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 
2901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 
2951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 
3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 
3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 
3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCCCGGG TTCCTCATAC 
3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 
3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 
3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 
3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 
3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 
3401 AAAAA 



BLAST Results 



Entry AC004 682 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAC clone CIT987SK-A-259H10, complete 
sequence. 

Score =» 1291, P = 0.0e+00, identities - 265/272 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known protein 

Prosite motifs: LEUCINE_ZIPPER (83-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE_ZIPPER (97-119) 

LEUCINE_ZIPPER (104-126) 

LEUCINE_ZIPPER (403-425) 

LEUCINE_ZIPPER (410-432) 

LEUCINE_ZIPPER (918-940) 



1 MKDEAGERDR EVSSLNSKLL SLQLDIKNLH DVCKRQRKTL QDNQLCMEEA 
51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLQQLKKKL LVLQQELEFH 
101 TEELQTSYYS LRQYQSILEK QTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 
151 ENTGEKLHLA QEQLALAGDK IASLERSLNL YRDKYQSSLS NIELLECQVK 
201 MLQGELGGIM GQEPENKGDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 
251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 
301 CEDIKKILKH LQEQKDSQCL HVEEYQNLVK DLRVELEAVS EQKRNIMKDM 
351 MKLELDLHGL REETSAHIER KDKDITILQC RLQELQLEFT ETQKLTLKKD 
401 KFLQEKDEML QELEKKLTQV QNSLLKKEKE LEKQQCMATE LEMTVKEAKQ 
451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 
501 EDTQRKLQKG LLLDKQKADT IQELQRELQM LQKESSMAEK EQTSNRKRVE 
551 ELSLELSEAL RKLENSDKEK RQLQKTVAEQ DMKMNDMLDR IKHQHREQGS 
601 IKCKLEEDLQ EATKLLEDKR EQLKKSKEHE KLMEGELEAL RQEFKKKDKT 
651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSQQVI QDLNKEIALQ 
701 KESLMSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 
751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI IAYEERMKKL NTELRKLRGF 
801 HQESELEVHA FDKKLEEMSC QVLQWQKQHQ NDLKMLAAKE EQLREFQEEM 
851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 
901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQENKKLK KEIEEKKMKA 
951 ENTRLCTKAL GPSRTESTQR EKVCGTLGWK GLPQDMGQRM DLTKYIGMPH 
1001 CPGSSYC 

BLASTP hits 
Entry HS417401_1 from database TREMBL: 

product: "trans-Golgi p230 M ; Human trans-Golgi p230 mRNA, complete 
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cds . 

Score - 411, P = 3.9e-34, identities - 212/862, positives = 420/862 
Entry SCINTANA_1 from database TREMBL: 

Saccharomyces cerevisiae integrin analogue gene, complete cds. 
Score = 404, P - 6.2e-34, identities = 199/897, positives = 423/897 

Entry HS6802_2 from database TREMBL: 

gene: M MYH9"7 product: "dJ6802 .2"; Homo sapiens DNA sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS. 

Score » 404, P = 1.9e-33, identities = 231/1028, positives 469/1028 
Entry AF092090_1 from database TREMBL: 

product: "cplSl"; Rattus norvegicus cpl51 mRNA, partial cds. 

Score = 2523, P = 3.0e-262, identities = 506/733, positives « 611/733 



Alert BLAST P hits for DKFZphtes3_lgl3, frame 1 

TREMBL :HSGOLGIN 1 product: M 256 kD golgin"; H. sapiens mRNA for golgin, 
N = 1, Score = 411, P = 4.4e-34 

TREMBL:HS417401_1 product: "trans-Golgi p230"; Human trans-Golgi p230 

mRNA, complete cds., N « 1, Score = 411, P = 4.5e-34 

TREMBL: SCI NT AN A_l Saccharomyces cerevisiae integrin analogue gene, 
complete cds., N = 1, Score = 404, P » 7.1e-34 



>TREMBL:HSGOLGIN_l product: "256 kD golgin"; H. sapiens mRNA for golgin 
Length = 2,185 

HSPs: 

Score » 411 (61.7 bits), Expect - 4.4e-34, P = 4.4e-34 
Identities = 212/816 (25%), Positives - 420/816 (51%) 

Query: 145 EMGNHNEN-TGEKLHLAQEQLALAGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL+ YR KY ++ ++L+ + K LQ 

Sbjct: 119 DMDSEAEDLVGNSDSLNKEQLI QRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQ 175 

Query: 204 GELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQKRLSEVWQ-KVSQQDDLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 176 G ILSQSQ DKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQ 227 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSSEEC-ED— IKK1LKHLQE 313 

++ + + ++ K L +L+ A P S E ED K L+ LQ+ 

Sbjct: 228 VSLLKQRLRNGPMNVDVLKPLPQLEPQ-AEVFTKEENPESDGEPVVEDGTSVKTLETLQQ 286 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ Q C ++ ++ L E EA+ EQ ++++ K++ DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIK-DLH-MAEKTKL 344 

Query: 367 HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV--QNSL 424 

+ +D I Q Q+ + ET++ + + L+ K+E + +L ++ Q+ Q 
Sbjct: 345 ITQLRDAKNLIEQLE-QDKGMVIAETKR QMHETLEMKEEEIAQLRSRIKQMTTQGEE 400 

Query: 425 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 484 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 
Sbjct: 4 01 LREQKE-KSERAAFEELEKALSTAQ— KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 456 

Query: 485 QA-AQCKEEAA-LAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL ++LQ ++E E+ + 

Sbjct: 457 QELSRVKQEVVDVMKKSSEEQIAKLQK — LHEKELARKEQELTKKLQTRERE — FQEQMK 512 

Query: 543 TSNRKRVEELSLELSEALRKLENSDKEKRQLQKT— VAEQDMKMNDMLDRIKHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + E + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILE 571 

Query: 601 IKCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG ELEALR-QEFKKKDKTL 651 

++ LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE IALQKESLMS 706 

+ ++ + E E LR + C + E+ L +K Q I+++N++ + +++ L S 
Sbjct: 632 QVLKQQYQTEMEKLREK CEQEKETLLKDKEI I FQAHI EEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT— TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQ 764 
L ++L + L K +H L+ ++ K+ D + ++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDE--QKNHHQQQVDSIIKEHEV 745 

Query: 765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

S + +T+ KA L+++I E +K+ + L++ + + E ++ + +L++ S ++ 
Sbjct: 746 SIQRTE--KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ + E L + + KD C 

Sbjct: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTE 855 

Query: 879 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLS-GEKDHLHSVMVHLQQENK 937 

L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + +++EN 

Sbjct: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT — QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 
Sbjct: 913 ILQMREGQKKEIEILTQKLSAK 934 

Score = 338 (50.7 bits), Expect = 3.1e-26, P = 3.1e-26 
Identities - 216/953 (22%), Positives » 468/953 (49%) 

Query: 2 KDEAGERDRE— VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EEAM 51 

K+E E D E V S K L +LQ +K ++ KR ++T+Q + + C +EA+ 
SbjCt: 260 KEEN PES DGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 319 

Query: 52 NSSHDKKQAQALAFEESEVEFGSSKQCHLRQ LQQLK — KKLLVLQQELEFHTEELQ 105 

D++ + ++ + + LR ++QL+ K +++ + + + H E L+ 

Sbjct: 320 QEQLDERLQELEKIKDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMH-ETLE 378 

Query: 106 TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 164 

+ Q +S +++ T+ L K K + EE +T +K A+ +L 

Sbjct: 379 MKEEEIAQLRSRIKQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 434 

Query: 165 ALAGDKIASLERSLNLYRDKYQSSLSNI--ELLECQVKMLQGELGGIMGQEPENKGDHSK 222 

A ++I ++E++ R Q LS + E+++ K + ++ + Q+ K K 
Sbjct: 435 AEMDEQIKTIEKTSEEERISLQQELSRVKQEWDVMKKSSEEQIAKL — QKLHEKELARK 492 

Query: 223 VRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQA 282 

+ T +E +E Q+++ +K SQ + L ++ + +L LE ++LQ 
Sbjct: 493 EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL— KISQEKEQQESLALEE LELQK 544 

Query: 283 DFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 341 

A T + +EE++L++ ++E +K KDL V LEA ++ 

Sbjct: 545 K-AILTESENKLRDLQQEAETYRTRILELESSLEKS LQENKNQSKDLAVHLEAEKNK 600 

Query: 342 QKRNIMKDMMKLELDLHGLREETSAHIERKDKDITI-LQCRLQELQLEFTETQKLTLKKD 400 

+1 +K++LL++A K++ Q +++L+ E E +K TL KD 
Sbjct: 601 HNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLLKD 659 

Query: 401 K FLQEKDEM-LQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKS 453 

K ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 

Sbjct: 660 KEIIFQAHIEEMNEKTLEKLDVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 717 

Query: 454 K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC-KEEAALAGCHLEDTQRKLQKGL 511 

K E E K + + + ++ ++ ++ Q+ + K++ L++ + L++ 

Sbjct: 718 KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQ 776 

Query: 512 L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 570 

++ +AD 1+ + ELQ + + + Q++ ++ + +L++ +KL + + E+ 
Sbjct: 777 AHVENLEAD-IKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 835 

' Query: 571 RQLQ KT V AEQDMKMN DM LD- -RIKHQHREQGS I K- -CKLEEDLQEATKLLEDKREQL 623 

L K VAE + + D+ LD +1+ Q Q K ++E+ ++ T++ EKE 
Sbjct: 836 ILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESKLEDG 895 

Query: 624 KKSKEHEK — LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 681 

K+E K L+E E L+ +K K ++ ++KL + +++ + T+ ++ 
Sbjct: 896 NKEQEQTKQILVEKENMILQMREGQK-KEIEILTQKLSAKEDSIHILNEEYETKFKNQEK 954 

Query: 682 KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 741 

K +Q +++ + + K+ L+ +A+L K L E L+ + ++ ++A + A 
Sbjct: 955 KME KV KQ KAKEMQET L KKKLLDQEAKLKKEL-7ENTALELSQKEKQFNAKMLEMAQA 1009 

Query: 742 QD-DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGF 800 

++ A+ +L T++ -*■ ++ SLT+ + +L + I +E KKLN + +L+ 
Sbjct: 1010 NSAGISDAVSRLE — TNQKEQIE-SLTEVHRR — ELNDVISIWE KKLNQQAEELQEI 1061 

Query: 801 HQESELEVHAFDKKLEEMSCQVLQW — QKQHQNDLKMLAAKEEQLREFQEEMAALKENLL 858 

H E+++ ++++ E+ ++L + +K+ N ++ KEE +++ + L+E L 
Sbjct: 1062 H EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 1116 
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Query: 859 EDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQ— WAKQQKVANEKLGNQLREQVNYI- 915 

+ L Q K L + + +L++ + ++Q V + L + + +V+ + 

Sbjct: 1117 QKSAHVNSLAQ-DETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 1175 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 
Sbjct: 117 6 SKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE 1212 

Score = 337 (50.6 bits), Expect = 4.0e-26, P - 4.0e-26 
Identities « 215/951 (22%), Positives = 433/951 (45%) 

Query: 10 REVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQALAFEESE 69 

+E + +++L L+ ++ K Q K L + EA + H+K+ + E+ + 

Sbjct: 560 QEAETYRTRILELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT— VMVEKHK 613 

Query: 70 VEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

Sbjct: 614 TELESLK— H-QQDALWTEKLQVLKQQYQTEMEKLREK— CEQEKETLLKD-KEIIFQA 666 

Query: 130 HHCKLKE DEVILYEEEMGNHNENTGEKL HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 
Sbjct: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSLSNIELLECQVKMLQGE — LGGIMGQEPENKGDHSKVRIYTSPCMIQEHQE 237 

+ +Q++I + E+V + + EL +Q +K+ ++ + 

Sbjct: 727 EQKNHHQQQVDSI-IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
Sbjct: 785 DIKRSEGELQQASAKLDVFQSYQS ATHEQTKAYEEQLAQLQQKLLDLE-TERIL 837 

Query: 298 SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE + + +K + ++ E 
Sbjct: 836 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITILQCRL-QELQLEFTETQKLTLKKDKF — LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+ ++E TQKL+ K+D L E+ E + 

Sbjct: 892 LEDGNKEQEQTKQILVEKENMILQMREGQKKEIEIL-TQKLSAKEDSIHILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLLK KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 466 

EKK+ +V+ + K+K L+++ + ELE T E Q K K+ K L+ Q 

Sbjct: 951 NQEKKMEKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQ-KEKQFNAKMLEM-AQ 1008 

Query: 467 KLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQR 526 

+ +A RLQQ+ + LD +KL Q+A+ +QE+ 
Sbjct: 1009 ANSAGISDAVS — RLETNQKEQIESLTEVHRRELNDVISIWEKKL NQQAEELQEIH- 1062 

Query: 527 ELQMLQKESSMAEKEQT SNRKRV EELSLELSEALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L++++ + L+E LE LE+++++ K+ 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E ++L+ +K +K+L++ S + ++ +E L +L C + E+ L T++ + + 
Sbjct: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q + K KE ++T E +A R+ Q+ L • QA 
Sbjct: 1242 KTNAILSR-ISHCQHRTTKV—KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLN TELRK— LRGFHQESE 805 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

Sbjct: 1298 QLEEKENQI KSMKADI ESL VTEKEALQKEGGNQQQAASEKESC I TQLKKELSENI NAVTL 1357 

Query: 806 LEVHAFDKKLE — EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 863 

++ +KK+E +S Q+ Q QN + L+ KE + +++ K LL D + / 

Sbjct: 1358 MKEELKEKKVEISSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE QVNYIAKLSG 920 

++ K+ D +W K+ + + N ++E Q+ +K + 

Sbjct: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EK DH - L H S VM VHLQQEN KK LKKE I EEKKMK AE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQINLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score « 332 (49.8 bits). Expect ■ 1.4e-25, P - 1.4e-25 
Identities = 209/953 (21%), Positives = 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + R+R+ + ++ +E++ + S + 

Sbjct: 470 MKKSSEEQIAKLQKLHEKEIARK-EQELTKKLQTREREFQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKGCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ L + + KL LQQE E + + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAILTESEN KLRDLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE— VILYEE EMGNHNENT— GEKLHLAQEQLALA 167 

+ Q+ DL + K K ++ ++ E+ E H ++ EKL + ++Q 

Sbjct: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDKIASL— ERSLNLYRDK YQSSLS--NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

+K+ + L +DK +Q+ + N + LE ++ + Q EL + + E 

Sbjct: 642 MEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ S ++++ +T K E+ K+ +Q + Q+ + + + + ++R + +K 
Sbjct: 701 HKLEEEL5 — VLKD--QTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKD 756 

Query: 281 QADFASCTATHR--YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+++ +K + + ++ +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKA 816 

Query: 339 VSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKLTLK 39B 

EQ + + ++ LE + L ++ A +E + KD+ C EL + Q L + 

Sbjct:- 817 YE EQL AQLQQKLLDLET ER I LLT KQV - A EVE AQKKDV CT — ELDAHKIQVQDLMQQ 869 

Query: 399 KDKFLQEKDEMLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 457 

+K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 
SbjCt: 870 LEK QNSEMEQKV-KS LTQV YES KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIE 925 

Query: 458 C — KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK — LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

Sbjct: 926 ILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K MA+ V L E + L ++ +R+ 

Sbjct: 982 KKELENTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL--TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKS KE 628 

L ++ + K+N + ++ H Q K + +L++ L ++E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 688 

+ L L+++ K+K + NS L ++ L+A L+ L SL + Q+ 

Sbjct: 1100 EGVKQDTTLNELQEQLKQKSAHV— NS— LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 VIQDLNKEIALQKESLMSLQAQL DKALQ — KEKHYLQTTITKEA YDALSRKSAA 740 

+ +L K + L ++L D+ Q K H ++ + LS + A 

Sbjct: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKL 790 

Q D+ T+AL E +N +S+T ++ ++ Q + +++E ++ + +L 

Sbjct: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+LR+L + +LEE Q+ K + D++ L ++E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNISFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

Sbjct: 1328 G — NQQQAASEKESC-ITQ — LKKELSE NINAVTLMKEELKEKKVEISSLSKQLTD 1378 

Query: 911 — QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ LS ++ + S+ +E +L ++++ K + 

Sbjct: 1379 LN VQLQNS I SLS EKEAAI SSLRKQYDEEKCELLDQVQDLS FKVO 1422 

Score » 329 (49.4 bits), Expect = 2.9e-25, P = 2.9e-25 
Identities - 226/941 (24%), Positives - 444/941 (47%) 

Query: 61 QALAFEESEVE — FGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

Q L E+ +++ S+ LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENT GEKL HLAQEQLALA 167 

+ QSL + + D+ ++E+ EN GE+ + + L 

Sbjct: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYT 227 
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++ EL ++ QS LL + + LQ +L + QE E D + + 



Sbjct : 


285 


QQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERL-QELEKIKD LHMAE 


340 


Query: 


228 


SPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASC 


287 






+1 +++++++ Q +1 E + ++ L ++ E+ + +L++ 




Sbjct: 


341 


KTKLITQLRDAKNLIEQLEQDKGM VIAETKRQM — HETLEMKEEE-IAQLRSRIKQM 


394 


Query: 


288 


TATH RYPPSSSEEC- -EDI KKILKHLQEQKDSQCLHVEEYQNLVKDL RVE 


335 






T R SE E+++K L Q+ ++++ E +K + R+ 




Sbjct : 


395 


TTfinFFT.RFOKFKSFRAAFFFT.FKAT.^TAOKTEEARRKLKAEMDEOIKTIEKTSEEERIS 


454 


Query: 


336 


LEA-VSEQKRNIMKDMMKL--ELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTET 


392 






L+ +5 k+ ++ d+mk E + L++ + RK++++T +LQ + EF E 




Sbjct : 


455 


LQQELSRVKQEW-DVMKKSSEEQIAKLQKLHEKELARKEQELTK---KLQTREREFQEQ 


510 


Query: 


393 


QKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 


452 






K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 




Sbjct : 


511 


MKVALEKSQ — SEYLKISQEKEQ QE5LALEELELQKKAIL-TESENKLRDLQQE- 


561 


Query: 


453 


SKEAECKALQAEVQKLKNSLEEAKQQER LAAQQAAQCKEEAALAGCHLEDTQR-K 


506 






++ + L+ E L+ SL+E K Q + L A++ KE + H + + K 




Sb j ct : 


562 


AFTYRTRTT.FT.F-^^T.FK^T.flFlMKNO^KDLAVHLFAEKNKHNKEITVMVEKHKTELESLK 


620 


Query: 


507 


LQKGLL L DKQKADT I QE LQRELQMLQKE S SMAE K E QT S N RK R VE E L S L E L S E A L RK - LEN 


565 






Q+ L + + Q+ Q E++ L +E EKE K + + E K LE 




Sb ct * 

JC 


621 


HnnnfiTUTPKi out vnnvryrPMPKT — rvkpfofttftt t if n(f ftt- fciahtffmmfictt.fk 


678 


Query: 


566 


SDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGSI-KCKLEEDLQEA-TKLLEDKR — E 


621 






D ++ +L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 




Sb j ct : 


679 




733 


Query: 


622 


QLKKS — KEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEEN ENLRAELQCCSTQL 


676 






Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ +L 




Sbjct : 


734 


OOVn^TTKFHFV^TORTEKALKnOTNOLELLLKERDKHLKEHOAHVENLEADTKRSEGEL 


793 


Query: 


677 


ESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSR 


736 






+ + K + Q +++ +E L LQ +L L+ E+ L TK+ + ++ 




Sb j ct : 


794 


OOA^AKTinVFn^Yn^ATHFOTKAYFFOT.AOTiOnKIi-T.nTjFTFRTI^t. TKOVAEVEAO 


848 


Query: 


• 737 


KSAACQD DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ — LEEEIIAYEE 


785 






k c + nr. o t,fk n *sf + +sltd F K + +E+ + 




Sb j ct : 


849 




905 


Query: 


786 


RMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVL— QWQKQHQNDLKMLAAKEEQL 


843 






+ +K N L+ G Q+ E+E+ +E S +L +++ + +N K + +++ 




Sb j ct : 


906 




963 


Query: 


844 


REFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 


899 






+F CiV T.K+ T.T.+ + + T. + + T. + O •+• + A+ 




Sbjct: 


964 


KEMQE TLKKKLLDQEAK LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGI S D 


1016 


Query: 


900 


ANEKLGNQLREQVNYIAKLSG-EKDHLHSVMVH-LQQENKKLKK— EIEEKKMKAENTRL 


955 






A +L +EQ+ + ++ E + + S+ L Q+ ++L++ EI+ ++ + E L 




Sbjct: 


1017 


AVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAEL 


1076 


Query: 


956 


CTKALGPSRTESTQREKVCGTLGWKGLPQD 985 








K L E + K L +G+ QD 




Sbjct: 


1077 


KQKIL-LFGCEKEEMNKEITWLKEEGVKQD 1105 




Score 


= 326 


(48.9 bits), Expect = 6.0e-25, P « 6.0e-25 




Identities = 220/907 (24%), Positives = 444/907 (48%) 




Query: 


67 


ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILE KQTS 


123 






E+E G+S + QL Q +++ EL T+Y L++ + L+ Q+ 




Sbjct: 


123 


EAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 


182 


Query: 


124 


DLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNLYRD 


183 






D L +L+E+ + +++ H + E+ + E+ 1+ L+ ++L + 




Sbjct: 


183 


DKSL-RRIAELREE--LQMDQQAKKHLQ EEFDASLEE KDQY I SVLQTQVSLLKQ 


233 


Query: 


184 


KYQSSLSNIELLECQVKMLQGELGGIMGQE-PENKG DHSKVR- 1 YTS PCMI QEHQ 


236 






+ ++ N+++L+ + L+ + +E PE+ G D + V+ + T ++ + 




Sbjct: 


234 


RLRNGPMNVDVLK-PLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQE 


292 


Query: 


237 


ETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPS 


296 






KR E Q +Q L+ K A L ER + L K++ D T 




Sbjct: 


293 


NLLKRCKETIQSHKEQCTLLTS— EKEALQEQLD-ERLQELEKIK-DLHMAEKTKLIT— 


346 



Query: 297 SSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELD 356 

+ D K +++ L++ K +E+++L++E++QR++KM + 
Sbjct: 347 QLRDAKNLIEQLEQDKGM--VIAETKRQMHETLEMKEEEIA-QLRSRIKQMTTQGEE 400 
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Query: 357 LHGLREETS-AHIERKDKDITILQCRLQE LQLE FT ETQKLT L KK DK FLQE K DEMLQ 411 

L +E++ A E +K ++ Q + +E L+ E E K T++K +E+ + Q 
Sbjct: 401 LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TIEKTSE-EERISLQQ 457 

Query: 412 ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 470 

EL + +V + + K E+++ K Q + E E+ KE Q+ +K+ + + + + Q +K 
Sbjct: 458 ELSRVKQEVVDVMKKSSEEQIAKLQKLH-EKELARKE— QELTKKLQTREREFQEQ-MKV 513 

Query: 471 SLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQ-KGLLLD-KQKADTIQELQREL 528 

+LE++ QEL Q++EAL L+ ++LD +Q+A+T + EL 

Sbjct: 514 ALEKS-QSEYLKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILEL 572 

Query: 529 QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNDM 587 

+ ES+E+S VLE++ +++ +K K +L+ +QD + 
Sbjct: 573 ES-SLEKSLQENKNQSKDLAVH-LEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 630 

Query: 588 LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE--QLKKSKEHEKLMEGELEALRQEF 644 

L +K Q++ E ++ K E QE LL+DK Q + +EK +E +L+ + E 
Sbjct: 631 LQV L KQQYQT EME KLRE KC E QEKETLLKDKEI I FQAHIEEMNEKTLE-KLDVKQTEL 686 

Query: 64 5 KKKDKTLKE — NSR-KLEEENENLRAELQCCSTQLESSLNKY-NTSQQVIQDLNKE — I A 698 

+ L E +R KLEEE L+ + +LE+ +++ N QQ + + KE ++ 

Sbjct: 687 ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVS 746 

Query: 699 LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 749 

+Q+ E + Q QL+ L +++KH + E +A ++S A+ + D+ Q+ 

Sbjct: 747 IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 806 

Query: 750 EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVH 809 

+ H +TK+ ++ L Q Q+K LE E I +++ ++ + + + +++V 
Sbjct: 807 QSATH — EQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTELDAHKIQVQ 864 

Query: 810 AFDKKLEEMSCQVLQWQKQHQN--DLKMLAAKEEQLREFQEEMAALKENLL EDDKE 863 

++LE+ + ++ Q K + K+ +EQ E +++ KEN++ E K+ 

Sbjct: 865 DLMQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ--EQTKQILVEKENMILQMREGQKK 922 

Query: 864 PC-CLPQ-WSVPKDTCRLYRGNDQIMTNLE-QWAKQQKVANE--KLGNQLREQV-NYIAK 917 

L Q S +D+ + N++ T + Q K +KV + ++ L++++ + AK 
Sbjct: 923 EI EI LTQKLS AKEDS I H I L- -NEEYETKFKNQEKKMEKVKQKAKEMQETLKKKLLDQEAK 980 

Query: 918 LSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKALGPSRTESTQREKV 973 

L K L + + L Q+ K+ ++ E M N+ + A+ SR E+ Q+E++ 
Sbjct: 981 L KKELENTALELSQKEKQFNAKMLE — MAQANSAGISDAV--SRLETNQKEQI 1029 

Score ~ 318 (47.7 bits), Expect = 4.4e-24, P = 4.4e-24 
Identities = 184/827 (22%), Positives = 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ + ++ L++ ++ + D Q 

Sbjct: 1323 LQKEGGKQQQAASEKESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q +++ EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSISLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLHHHCKLKEDEVILY E EEMGNHNENTGE KLH L AQEQ L A L AGDK I A 172 

E K+ + H +KE ++ L + + ++ E+++L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD--EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNIEL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRIYTSPC 230 

L+ + + K + SN+E L+ Q + EL + Q+ E + + ++ Y 
Sbjct: 1497 CLKGEMEDDKSKMEKKESNLETELKSQTARIM-ELEDHITQKTIEIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH+E ++L + ++D+ ++E K+ L LE + +K + + 

Sbjct: 1556 DI-EHKELVQKLQHFQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + L+ + ++ ++ + + + ++ +L + E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEK EE 1664 

Query: 350 MMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKL — TLKKDKFLQEKD 407 

K + H E + ++ +++++ IL+ +L+ ++ +ET + + K E++ 
Sbjct: 1665 QYKKGTESH— LSELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEMTVK-EAKQDKSKE 4 55 

E +Q+ E+K++ +Q +L +KEK L EK++ +++ EM + + + K + 

Sbjct: 1723 EADSQGCVQKTYEEKISVLQRNLTEKEKLLQRVGQEKEETVSSHFEMRCQYQERLIKLEH 1782 

Query: 456 AECKAL — QAEVQKLKNSLEEAKQQERLAAQQAAQCK — EEAALAGCHLEDTQRKLQKGL 511 



670 



WO 01/12659 



PCT/IB00/01496 



AE K Q+ + L+ LEE ++ L Q + + + A +LE+ +QK L 
Sbjct: 1783 AEAKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKTL 1842 

Query: 512 LLDKQKADTIQELQRELOMLOKESSMAEKEQTSNRKRVEELS--LELSEALRKLENSDKE 569 

++K T Q L+++++ L +S + +++ +R +EEL+ E +AL++++ +K 
Sbjct: 1843 QEKELTCQILEQKIKEL — DSCLVRQKEV-HRVEMEELTSKYEKLQALQQMDGRNKP 1896 

Query: 570 KRQLQKTVAEQD— -MKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 

L++ E+ + +L ++ QH + E + Q+ K + ++ L+ 

Sbjct: 1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDLRML 1956 

Query: 626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYKT 685 

KEH++ ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

Sbjct: 1957 RKEHQQ ELEILKKEYDQ EREEKIKQEQEDL— ELKHNST-LKQLMREFNT 2003 

Query: 686 S-QQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 74 4 

Q Q+L I ++A+L ++ Q+E + L IEDLR+A ++ 

Sbjct: 2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

Query: 745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMK — KLNTELRKLRGFH 801 

+ A E+ +T++ + LQ L + Q+K Q LE+E + + +L T+L + 

Sbjct: 2062 ILDAREE — EMTAKVRDLQTQLEELQKKYQQKLEQEENPGNDNVTIMELQTQLAQKTTLI 2119 

Query: 802 QESELEVHAFDKKLEEMSCQVLQWQK 827 

+S+L+ F +++ + ++ +++K 
Sbjct: 2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 

Score = 316 (47.4 bits), Expect = 7.1e-24, P = 7.1e-24 
Identities » 213/977 (21%), Positives « 454/977 (46%) 

Query: 4 EAGERD-REVSSLNSKLLSLQLD-IKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQ 61 

E R+ +V S+ K L+ Q + ++ +H++ +QK++L++ +++ 
Sbjct: 1034 E VH RRE LN DV I S I WE K KLNQQAE ELQE I H E I - QLQEK EQE V AELKQK I LL FGC E KEEMNK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE— LEFHTEELQTSYYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L+ + 

Sbjct: 1093 EITWLKEE GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLNKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV— ILYEEEMGNHNENTGEKLHLAQEQLALAGDKI 171 

+ L++Q +L +L K K E+ + +E +++ EK + + E +L K+ 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKL 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS — NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTS 228 

+ L L++ K ++ L EL+ L I +++ K + 

Sbjct: 1210 SEELAIQLDICCKKTEALLEAKTNELINISSSKTNAILSRI — SHCQHRTTKVKEALLIK 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + E Q L+ +Q+ + Q ++ ++++ A +LV E+E L 
Sbjct: 1268 TCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKENQIKSMKADIESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVS 340 

Q++ + SECI++KLE++LEE +K+ +VE+ ++S 
Sbjct: 1324 QKEGGN QQQAASEKESC — ITQLKKELSENINAVTLMKEE LKEKKVEISSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQEL—QLEFTETQKLT-L 397 

+Q ++ + + L S+ ++ D++ L ++Q+L +++ +K++ L 

Sbjct: 1374 KQLTDLNVQLQN-SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKISAL 1432 

Query: 398 KK-DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV KEAKQDKS 453 

++ D + + E ++ + + TQ QN++ + + +LE + A E + + KE ++ 
Sbjct: 1433 EQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQN 1492 

Query: 454 KE AEC KALQ AE VQK LKN S LE E AKQQE RL AAQQAAQC KE E AAL AGC H L E - DTQ RKLQKG L L 512 

K +C + E K K +E+ + L +Q A + E + +E ++ ++ K 
Sbjct: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

++QK +EL ++LQ Q+ + +++ L ++ +LE KE 

Sbjct: 1552 -NQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQ-GSIKCKLEEDLQEATKLL EDKREQLKKSK 627 

+ +V ++ ++ + DR++ + + +K K E+ + K L E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGT 1670 

Query: 62B EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE +++ L+E + +E ++E L A+ T+ E + ++ 

Sbjct: 1671 ESHL SELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQ 1727 

Query: 683 YNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 739 

T ++ I L + + +KE L+ Q+K H+ +E L A 

Sbjct: 1728 GCVQKT YEEKI SVLQRNLT-EKEKLLQRVGQ-EKEETVSSHFEMRCQYQERLI KLEHAEA 1785 
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Query: 740 ACQDDLTQALEKLNHVTSET--KSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKL 797 

+D Q++ + H+ E K+ + SL Q + + + I ++ ++ + +++K 
Sbjct: 1786 KQHED--QSM— IGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKT 1841 

Query: 798 RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

Sbjct: 1842 L QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKYEKLQALQQMDGRNKPT 1897 

Query: 857 -LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E PK + ++ + L A+++K +KLG ++ + 

Sbjct: 1898 ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK— QKLGKEIVRLQKDL 1953 

Query: 916 AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

Sbjct: 1954 RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNST--LKQLMREFNTQLAQKEQ 2010 

Score « 301 (45.2 bits), Expect = 2.9e-22, P « 2.9e-22 
Identities = 221/952 (23%), Positives » 441/952 (46%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CMEEAMNSSHD- 56 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 
Sbjct: 1160 LKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 1219 

Query: 57 --KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT EELQTSYY 109 

KK L + +E + SSK L ++ + + +++ L T EL+ 
Sbjct: 1220 CCKKTEALLEAKTNELIN I SSSKTNAI LS RI SHCQHRTTKVKEALLI KTCTVSELEAQLR 1279 

Query: 110 SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 

Sbjct: 1280 QLTEEQNTLNI S FQQAT HQLEEKENQI KSMKADI ESLVTEKEALQKEGGNQQQA 1333 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

A +K E + + + +++ + L++ ++K + E+ + Q + V++ 
Sbjct: 1334 ASEK ESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTD LNVQLQ 1384 

Query: 227 TSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFAS 286 

S + ++ + ++ + D +Q+L K+ + L E+ AL ++ D+++ 
Sbjct: 1385 NSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKV DTLSKEKISALEQVD-DWSN 1440 

Query: 287 CTATHRYPPSS — SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+ + S ++ +K++ L E K + +E NL+-K+ R + L+ 

Sbjct: 1441 KFSEWKKKAQSRFTQHQNTVKELQIQL-ELKSKEAYEKDEQINLLKEELDQQNKRFDCLK 1499 

Query: 338 AVSEQKRNIM-KDMMKLELDLHGLRE ETSAHIERKDKDITILQCRLQEL-QLEFTET 392 

E ++ M K LE +L E HI +K +1 L L+ Q + E 

Sbjct: 1500 GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 1559 

Query: 393 QKLTLKKDKFLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD ++E E+K+ ++N + + ELE ++ + ++VK 
Sbjct: 1560 KELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVK 1616 

Query: 450 QDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQK 509 

SKE E KAL+ ++ S + + +R A Q+ A K++ +E+ + + +K 

Sbjct: 1617 SKEEELKALEDRLES — ESAAKLAELKRKAEQKIAAIKKQLL SQMEEKEEQYKK 1668 

Query: 510 GLLLDKQKADT-IQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 

G + +T +QE +RE+ +L+++ EQ+ + S+ A+E+D 

Sbjct: 1669 GTESHLSELNTKLQEREREVHILEEKLKSVES5QSETL — IVPRSAKNVAAYTEQEEADS 1726 

Query: 569 E KRQLQK-TVAEQDMKMND-MLDRI KHQHREQGS I KCKLEEDLQEATKLLEDKREQ 622 

+ K +K +V ++++ + +L R+ Q +E+ ++ E Q +L+ K E 
Sbjct: 1727 QGCVQKTYEEKI SVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI — KLEH 1782 

Query: 623 LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ +K+HE +MGLEL++KK +++ KE N++A+ LE 
Sbjct: 1783 AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QNLE— 1832 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKAL— QKEKHYLQTTITKEAYDALSR-K 737 

N ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

Sbjct: 1833 N VFDDVQKTLQE- -KELTCQ- - I LEQKI KELDSCLVRQKE VHRVEMEELTSKYEKLQALQ 1888 

Query: 738 SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMKKLNTEL — 794 

++ T+ LE+ S++ +Q L E + LE ++ E +KL E+ 

Sbjct: 1889 QMDGRNKPTELLEENTEEKSKSHLVQPKLLSKMEAQHNDLEFKLAGAEREKQKLGKEIVR 1948 

Query: 795 — RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAA 852 

+ LR +E + E+ K+ ++ + ++ Q+Q +LK + ++ +REF ++A 
SbjCt: 194 9 LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLMREFNTQLAQ 2007 

Query: 853 LKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQV 912 
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++ L KE Q V + + Q TN Q K K+A EK + R 
Sbjct: 2008 KEQELEMTIKETINKAQ-EVEAELLESH QEETN— QLLK— KIA-EKDDDLKRTAK 2057 

Query: 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKE^EEKKMKAEN 952 

Y L ++ + + + LQ + ++L+K+ ++K + EN 
Sbjct: 2058 RYEEILDAREEEMTAKVRDLQTQLEELQKKYQQKLEQEEN 2097 

Score - 300 (45.0 bits), Expect = 3.7e-22, P = 3.7 e -22 
Identities = 195/961 (20%), Positives « 435/961 (45%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKN-- LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 

+KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

Sbjct: 657 LKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLRQYQSI 117 

+E E + K H +Q+ + K+ V +Q+ + +++ L++ 
Sbjct: 715 DKMK QELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

Query: 118 LEKQTSDLVLLHHHCKLKEDEVILYEEEMG NHNENTGEKLHLAQEQLALAGDKIASL 174 

L++ + + I* K E E+ ++ ++ T E+ +EQLA K+ L 

Sbjct: 772 LKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDL 831 

Query: 175 ERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQ-EPENKGDHSKVRIYTSPCMIQ 233 

E L + + + + ++ + ++ +M Q E +N KV+ T 

Sbjct: 832 ETERILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQ-VYES 890 

Query: 234 EHQETQKRLSEVWQKVSQQDDLIQELRN KLACSNALVLEREKALIKLQADFASCTA 289 

+ ++ K + Q + +++++I ++R ++ + +E ++ L ++ + 
Sbjct: 891 KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDSIHILNEEYET 947 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 34 9 

++ + ++ E +K+ K +QE + L E L K+L +S++++ 
Sbjct: 948 --KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA— KLKKELENTALELSQKEKQFNAK 1002 

Query: 350 MMKL-ELDLHGLREETSA-HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKD 407 

M+++ + + G+ + S +K++ ++ + +EL + +K ++ + LQE 

Sbjct: 1003 MLEMAQANSAGI S DAVSRLETNQKEQI ESLTEVHRRELNDVI S IWEKKLNQQAEELQEI H 1062 

Query: 408 EM-LQELEKKLTQVQNSLLK KEKELEKQQCMATE LEMTVKEAKQD-KSKEAEC 458 

E+ LQE E+++ +++ +L +++E+ K+ E + T+ E ++ K K A 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 459 KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+L + KLK LE+ + + ++ +E+ E+ +RK+ + L K K 

Sbjct: 1123 KSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE — LTSKLKT 1180 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

T +E Q +K + E + +K EEL+++L +K E + K + + 

Sbjct: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN — ELIN 1237 

Query: 57 9 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + +• QL++ E + + 

Sbjct: 1238 ISSSKTNAILSRISHCQHRTT KVKEALLI KTCTVSELEAQLRQLTEEQNTLNI S F 1292 

Query: 638 EALRQEFKKKD— KTLKENSRKLEEENENLR AE LQCC S TQLE SSL 680 

+ + ++K+ K++K + L E E L+ +E + C TQL+ L 

Sbjct: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENI 1352 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK +++ K+ YD + 

Sbjct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 740 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELR-KL 797 

DL+ ++ L+ S + + + E K + + ++ +K+L +L K 

Sbjct: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 798 RGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +++ E +++ ++L++ + + + + ++D + KE L E++A+E 
Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 857 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIA 916 

LED + + T + N+ ++ N Q QK K +L +++ + 
Sbjct: 1530 -LEDH ITQKTIEIESLNE-VLKNYNQ QKDIEHK ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

+L EKD+ ++ L+ + +K E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score - 298 (44.7 bits), Expect = 6.1e-22, P - 6.1e-22 
Identities ~ 207/886 (23%), Positives - 412/886 (46%) 
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Query: 47 MEEAMNSSHDKKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQT 106 

+ EN++ Q EEE+SK ++L + LQ+E + 

Sbjct: 1281 LTEEQNTLNISFQQATHQLEEKENQIKSMKA DIESLVTEKEALQKEGGNQQQAASE 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ +++ + N + L++++ A 

Sbjct: 1337 KESCITQLKKELSEN1NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

I+SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWK-K 1447 

Query: 227 TSPCMIQEHQETQKRLS EVWQKVSQQDDLIQEL — RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI--TILQCRLQEL 385 

LV+ L+ + + E+K N +K+ + L L A +E K K++ L + +E 

Sbjct: 1562 LVQKLQ-HFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+L+ E + L+ + + E+ ++ E+K+ ++ LL + +E E+Q TE ++ 
Sbjct: 1621 ELKALEDR LESES-AAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGTESHLSE 1676 

Query: 44 6 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

K + +E E L+ +++ +++S E R A AA + EEA GC + + 

SbjCt: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLE 564 

K+ +L + + + LQR Q +KE +++ + R + +E ++L A K 
SbjCt: 1736 EKI S VLQRNLTEKEKLLQRVGQ- -EKEETVSSHFEM — RCQYQERLI KLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG — SIKCK — LE EDLQ E 611 

LQ+ + E++ K + ++ +H +E G +1+ K LE +D+Q E 
SbjCt: 1789 EDQSMIGHLQEELEEKNKKYSLIV--AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 184 6 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T ++LE K ++L +K + E+E L +++K '+ + R +L EEN 

SbjCt: 1847 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMDGRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLN-KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L+KE H + 

Sbjct: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDL-RMLRKE-HQQEL 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYE 784 

I K+ YD R+ Q+ + LE L H H + +++ TQ +K+ +LE I + 
Sbjct: 1965 EILKKEYDQ-EREEKIKQEQ--EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E + K +L HQE E + KK+ E + + K+++ ++L A+EE++ 
SbjCt: 2018 ET I NKAQEVEAELLESHQE ETNQLLKKIAEKDDDLKRTAKRYE EILDAREEEMT 2071 

Query: 84 5 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

+ + EL+++ LQ PD + ++TLQK +++ K 

SbjCt: 2072 AKVRDLQTQLEELQKKYQQK — LEQEENPGNDNVTIM ELQTQLAQ — KTTLISDSK 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ++ + +L + ++++ V HL 
Sbjct: 2124 LKEQEFREQIHNLEDRLKKYEKNVYATTVGHL 2155 

Score =» 2B0 (42.0 bits), Expect » 5.2e-20, P = 5.2e-20 
Identities = 209/938 (22%), Positives « 432/938 (46%) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKN-LHDVC-KRQRKTLQDNQLCMEEAM-NSSHDKK 58 

++ ++ +E+ +L KLL + +K L + + +K Q N +E A NS+ 
Sbjct: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFKAKMLEMAQANSAGISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

L + E + S + H R+L + + + +++L EELQ + ++ + 
Sbjct: 1017 AVSRLETNQKE-QIESLTEVHRRELNDV ISIWEKKLNQQAEELQ-EIHEIQLQEK — 1069 

Query: 119 EKQTSDLV— LLHHHCKLKE-DEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLE 175 

E++ ++L +L C+ +E ++ I + +E G + T +L +Q + + +A E 
Sbjct: 1070 EQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHVNSLAQDE 1129 

Query: 176 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI--MGQEPENKGDHSKVRXYTSPCMIQ 233 
L ++K+ LN LE LQ +L + + +E + K ++ T+ Q 
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SbjCt: 1130 TKLKAHLEKLEVDL-NKSLKENT--FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

Query: 234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL — AC — SNALVLEREKALIKLQADFA 285 

H+++ K L + K + L +EL +L C + AL+ + LI + + 
Sbjct: 1187 SLKSSHEKSNKSLED— KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

Query: 286 SCTATH-RYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 

+ + + + + ++I + ++Q + E QN + + E+K 

Sbjct: 124 4 NAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKE 1303 

Query: 34 5 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 

N +K M K +++ L +E + + + + + +L+ E +E +TL K++ 

Sbjct: 1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

Query: 403 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQ 462 

L+EK + L K+LT + N L+ L +++ +L EK+ ++ L 

Sbjct: 1362 LKEKKVEISSLSKQLTDL-NVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQ— DLS 1418 

Query: 4 63 AEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+V L A +Q + + + + K++A ++T ++LQ L L + +A 

SbjCt: 1419 FKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD 1478 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

Sbjct: 1479 EQINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLET ELKSQTARIMELEDHIT 1535 

Query: 57 9 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

++ + ++ + + +K+ + +Q 1+ K L + LQ +L E+K ++K+++E +E ++ 
SbjCt: 1536 QKTIEIESLNEVLKN-YNQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQV 1594 

Query: 638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 

+++ E+KKL+ + ++ + E L+A L+ +LES S K ++ + ++ 

SbjCt: 1595 YSMKAELETKKKELEHVNLSVKSKEEELKA-LE— DRLESESAAKL AELKRKAEQK 1647 

Query: 697 IALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 756 

IA K+ L+S Q++ +KE+ Y + T + L+ K + ++ EKL V 

Sbjct: 1648 IAAI KKQLLS QME EKEEQYKKGT--ESHLSELNTKLQEREREVHILEEKLKSVE 1699 

Query: 757 S ET KSLQQSLTQTQEKKAQLEEEI I -AYEERMKKLNTELRKLRGFHQESELEV 80B 

S ET +S + T++++A + + YEE++ L L E E + 

SbjCt: 1700 SSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

Query: 809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 

++ EE + + Q+Q L L E + E Q + L+E L E +K+ + 

SbjCt: 1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSLIV 1812 

Query: 869 QWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEK-LGNQLREQ-VNYIAKLSGEKDHL 925 

V K+ + N Q NLE + QK EK L Q+ EQ + + + + 

SbjCt: 1813 AQHVEKEGGK NNIQAKQNLENVFDDVQKTLQEKELTCQILEQKIKELDSCLVRQKEV 1869 

Query: 926 HSV-MVHLQQENKKLK 940 

H V M L + +KL+ 
SbjCt: 1870 H RVEMEELTSK YEKLQ 1885 

Score = 227 (34.1 bits), Expect = 2.5e-14, P = 2.5e-14 
Identities « 160/716 (22%), Positives = 318/716 (44%) 

Query: 233 QEHQETQKRLSEVWQKVSQQDDLIQE-LRNKLACSNALV-LEREKALIKL-QADFASCTA 289 

+E +TQ ++ +V + L + ++ L S++ LR+ L + DSTA 
Sbjct: 53 RESGDTQSFAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTA 112 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

+ P E ED+ L +++ QL + + R+ + + + ++ 

SbjCt: 113 SFDPPSDMDSEAEDLVGNSDSLNKEQLIQRLR— RMERSLSSYRGKYSELVTAYQMLQRE 170 

Query: 350 MMKLELDLHGLREETSAHIERKDKDIT-ILQCRLQELQLEFTETQKLTLKKDKFLQEKDE 408 

KL+ G+ ++ +DK + I + R +ELQ++ + L + D L+EKD+ 

SbjCt: 171 KKKLQ GILSQS QDKSLRRIAELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCt4ATELEMTVKEAKQDKSKEAECKALQAE V 4 65 

+ L+ + + + ++ L ++ + + +LE + ++++ E++ + + + V 

Sbjct: 220 YISVLQTQVSLLKQRLRNGPMNVDVLKPLP-QLEPQAEVFTKEENPESDGEPVVEDGTSV 278 

Query: 466 QKLKNSLEEAKQQERLA — AQQAAQC-KEEAALAGCHLEDTQRKLQKGLL-LDKQKADTI 521 

+ L+ + K+QE L ++ Q KE+ L E Q +L + L L+K K + 

Sbjct: 279 KTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIKDLHM 338 

Query: 522 QELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQD 581 

E++L+ ++E++ +E ++EL E +R K+Q 

Sbjct: 339 AEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMHETLEMKEEEIAQLRSRIKQMTTQG 398 
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Query: 


582 


Sbjct: 


399 


Query: 


637 


Sbjct: 


459 


Query: 


696 


Sbjct: 


513 


Query : 


755 


Sbjct: 


569 


Query: 


813 


Sb j ct : 


62 5 


Query: 


863 


Sbjct: 


682 


Query: 


923 


Sbjct: 


740 


Score 


= 183 



++ + ++ + 



++QE 



E+ 



+ +EA KL + EQ+K 



K+ +E KL++ +E EL 



+L 



— KSKEHEKL-MEGE 636 
K+ E E++ ++ E 



T ++ Q+ K 



+AL+K 



L+ 



+K Q+ + + K+A 



DL Q E 



SL++SL 



QE K Q ++ + E K N E+ + H+ +ELE H D 
-QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHQQD 624 



QVL+ +Q+Q +++ L K EQ +E 



S+ 



-FQEEMAALKENLLED-DK 862 
FQ + + E LE D 



++L ++ EQ N+ + 



H V + Q+ K LK +1 + ++ 
IKEHEVS I - -QRTEKALKDQI NQLEL 763 

(27.5 bits), Expect - 1.3e-09, P = 1.3e-09 
Identities - 132/584 (22%), Positives - 251/584 (42%) 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK-QDKSKEAECKALQAEVQK 4 67 

M ++L++K+++ QL+ + +TM++++E +Q 

Sbjct: 1 KFKKLKQKISEEQQQLQQALAPAQASSNSSTPTRMRSRTSSFTEQLDEGTPNRESGDTQS 60 

Query: 4 68 LKNSLE-EAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA — DTIQEL 524 

L+ EL + ++ + + R+ L LD AD ++ 

Sbjct: 61 FAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTASFDPPSDM 120 

Query: 525 QRELQMLQKESSMAEKEQTSNRKRVEELSL ELSEALRKLENSDKEKRQLQKTVAE 579 

E + L S KEQ R R E SL + SE + + +EK++LQ +++ 

Sbjct: 121 DSEAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQ 180 

Query: 580 -QDMKMNDMLDRIKHQHREQGSIKCKLEE DLQEATK LLEDKREQLKKSKEHEKL 632 

QD + + + + +Q + K EE L+E + +L+ + LK+ + + 
Sbjct: 181 SQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQVSLLKQRLRNGPM 240 

Query: 633 MEGELEALRQ-EFKKKDKTLKENSRKLEE ENENLRAELQCCSTQLESSLNKYNTSQQ 688 

L+ L Q E + + T +EN E E+ L+ +++ N ++ 

Sbjct: 241 NVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKE 300 

Query: 689 VIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQA 748 

IQ ++ L +LQ QLD+ LQ E ++ E +++ A +L + 

Sbjct: 301 TIQSHKEQCTLLTSEKEALQEQLDERLQ-ELEKIKDLHMAEKTKLITQLRDA— KNLIEQ 357 

Query: 749 LEK-LNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELE 807 

LE+ V +ETK + + +T E K EEEI R+K++ T+ +LR Q+ + E 

Sbjct: 358 LEQDKGMVIAETK RQMHETLEMK EEEIAQLRSRIKQMTTQGEELR--EQKEKSE 409 

Query: 808 VHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQ EEMAALKENLLEDDKE 863 

AF EE+ + QK + K+A +EQ++ + EE +L++ L +E 
Sbjct: 410 RAAF EELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQELSRVKQE 465 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEKLGNQLR EQVNYIAK 917 

+ + S + +L + +++ + EQ K+ + + Q++ Q Y+ K 

Sbjct: 4 66 WDVMKKSSEEQIAKLQKLHEKELARKEQELTKKLQTREREFQEQMKVALEKSQSEYL-K 524 

Query: 918 LSGEKDHLHSVMVH-LQQENKKLKKEIEEK KMKAENTRLCTKALGPSRTESTQREK 972 

+S EK+ S++ L++K+ EEK + +AE R L S +S Q K 

Sbjct: 525 ISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILELESSLEKSLQENK 584 



Pedant information for DKFZphtes3_lgl3, frame 1 



Report for DKFZphtes3_lgl3 . 1 



[LENGTH] 1007 

[MW] 117480.77 

tpl] 5.90 
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[HOKOL] 


TREMBL: AF092090 1 product: M cpl5l"; Rattus norvegicus cplSl mRNA, partial cds 


0.0 

[FUNCAT] 


30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-15 


[FUNCAT] 


08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 


5e-15 




[FUNCAT] 


09.10 nuclear biogenesis [S. cerevisiae, YDR356w] le-11 


[FUNCAT] 


30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] le-11 


[FUNCAT] 


03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-11 


[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YKR095w] le-08 


( FUNCAT] 


11.04 dna repair (direct repair, base excision repair and nucleotide excision 


repair) 


[S. cerevisiae, YKR095w] le-08 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YLR309c] le-08 


[FUNCAT] 


1 genome replication, transcription, recombination and repair [M. 


jannaschii, MJ1322] 4e-06 


[FUNCAT) 


09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w) 9e-06 


[FUNCAT] 


03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 


MYOl - myosin- 


1 isoform] 3e-04 


[ FUNCAT] 


08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 


myosin-1 isoform] 3e-04 


[FUNCAT] 


03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-04 


[FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YJR134c] 5e-04 


[EC] 


3.6.1.32 Myosin ATPase le-16 


[PIRKW] 


nucleus 3e-10 


[PIRKW] 


phosphotransferase 6e-09 


[PIRKW] 


duplication 2e-06 


[PIRKW] 


citrulline 2e-12 


[PIRKW] 


tandem repeat le-16 


[PIRKW] 


endocytosis 2e-13 


[PIRKW] 


heart 8e-13 


[PIRKW] 


transmembrane protein le-13 


[PIRKW] 


serine/threonine-specific protein kinase 6e-09 


[PIRKW] 


zinc finger 2e-13 


[ PIRKW] 


metal binding 2e-13 


[PIRKW] 


DNA binding 4e-12 


[PIRKW] 


muscle contraction le-16 


[PIRKW] 


acetylated amino end le-11 


[PIRKW] 


actin binding le-16 


[PIRKW] 


mitosis 5e-15 


[PIRKW] 


microtubule binding 5e-15 


[PIRKW] 


ATP le-16 


[PIRKW] 


thick filament le-16 


[PIRKW] 


phosphoprotein 4e-16 


[PIRKW] 


skeletal muscle 2e-14 


[PIRKW] 


calcium binding 2e-12 


[ P I RKW ] 


alternative splicing le-16 


[PIRKW] 


coiled coil le-16 


[P.IRKW] 


P-loop le-16 


[PIRKW] 


heptad repeat 3e-10 


[PIRKW] 


methylated amino acid le-16 


[PIRKW] 


immunoglobulin receptor 2e-06 


[PIRKW] 


peripheral membrane protein 2e-13 


[ PIRKW] 


cardiac muscle Se— 13 


[PIRKW] 


hydrolase le-16 


[PIRKW] 


microtubule 3e-10 


[PIRKW] 


muscle 8e-13 


[PIRKW] 


EF hand 2e-12 


[PIRKW] 


cytoskeleton 2e-15 


[PIRKW] 


hair 2e-12 


[PIRKW] 


calmodulin binding 2e-13 


[PIRKW] 


Golgi apparatus 3e-10 


[SUPFAM] 


myosin heavy chain le-16 


[SUPFAM] 


conserved hypothetical P115 protein le-07 


[SUPFAM] 


centromere protein E 5e-15 


[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 


[SUPFAM] 


calmodulin repeat homology 2e-12 


[SUPFAM] 


myosin motor domain homology le-16 


[SUPFAM] 


alpha-actinin actin-binding domain homology 2e-07 


(SUPFAM) 


plectin 2e-07 


(SUPFAM] 


trichohyalin 2e-12 


[SUPFAM] 


pleckstrin repeat homology 8e-08 


[SUPFAM) 


ribosomal protein S10 homology 2e-07 


[SUPFAM] 


giantin 3e-13 


[SUPFAM] 


protein kinase homology 6e-09 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 8e-08 


[SUPFAM] 


kinesin motor domain homology 5e-15 


[SUPFAM] 


human early endosome antigen 1 2e-13 


[SUPFAM] 


M5 protein le-07 


[PROSITE] 


LEUCINE ZIPPER 7 


[PROSITE] 


MYRISTYL 2 


[PROSITE] 


CAMP PHOSPHO SITE 2 


[PROSITE] 


CK2 PHOSPHO SITE 20 
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[PROSITE] TYR PHOSPHO SITE 1 

[PROSITE] PKC"PHOSPHO~SITE 16 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] All Alpha 

[KW] LOW~COMPLEXITY 15.00 % 

(KW] COILED_COIL 42.40 % 

SEQ MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQA 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QAIAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ REETSAHIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCC CCCCCCCCCCCCCC 

SEQ QNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG . . .xxxxxxxxxx xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCC 

SEQ EQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ IAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 

SEQ NEKLGNQLREQVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKAL 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ G P S RT E S TQREK VCGT LGW KG L PQDMGQRMD LT K Y I GM PHC P G S S YC 

SEG 

PRD cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 

COILS 



Prosite for DKFZphtes3_lgl3. 1 



PS00001 


52 


:->56 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


684- 


>688 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


240- 


>244 


CAMP_PHOSPHO_SITE 


PDOC00004 


PS00004 


415- 


>419 


camp2phospho_site 


PDOC00004 


PS00005 


74 


l->77 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


110- 


>113 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


238- 


>241 


P KC_PHOS PHO_S I TE 


PDOC00005 


PS00005 


290- 


•>293 


P KC_PHOS PHO_S I TE 


PDOC00005 


PS00005 


392- 


■>395 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


396->399 


PKC_PHOS PHO_S ITE 


PDOC00O05 


PS00005 


444- 


■>447 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


503- 


■>506 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


544- 


->547 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


566->569 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


600- 


■>603 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


650->653 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


655- 


->658 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


735- 


•>738 


PKC_PHOSPHO~SITE 


PDOC00005 


PS00005 


876- 


■>879 


PKC_PHOS PHO_S ITE 


PDOC00005 


PS00005 


968->971 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00006 


39->43 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


53->57 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


68->72 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


116- 


>120 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


190- 


•>194 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


250->254 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


296- 


■>300 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


439- 


■>443 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


444- 


■>448 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


471- 


■>475 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


520- 


•>524 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


536- 


■>540 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


566->570 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


576- 


■>580 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


650- 


•>654 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


674- 


■>678 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


804- 


•>808 


CK2 PHOSPHO SITE 


PDOC00006 


PS00OO6 


888->892 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


963- 


■>967 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


968- 


•>972 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


135- 


■>143 


TYR~PHOSPHO SITE 


PDOC00007 


PS00008 


207->213 


MYRISTYL 


PDOC00008 


PS00008 


599- 


■>605 


MYRISTYL 


PDOC00008 


PS00029 


83- 


•>105 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


90- 


->112 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


97- 


•>119 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


104- 


■>126 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


403- 


>425 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


410- 


•>432 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


918- 


■>940 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for DKFZphtes3_lgl3 . 1) 



DKFZphtes3_lkll 



group: cell structure and motility 
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DKFZphtes3_lkll encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-bindXng protein (ENC-1) . 

Ectoderm-neural cortex-1 protein (ENC-1) is an early and highly specific marker of neural 
induction in vertebrates. The protein is related to the kelch family proteins and is expressed 
during early gastrulation in the prospective neuroectodermal region of the epiblast and later 
in development throughout the nervous system (NS) . ENC-1 functions as an actin-binding protein 
organising the actin cytoskeleton during neural differentiation and development of the NS. 
The novel protein is highly similar to ENC-1. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 



strong similarity to mouse ENC-1 

complete cDNA, compete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3525 bp 

Poly A stretch at pos. 3515, polyadenylation signal at pos. 3499 



1 GGTGGAGAGC CGGCCGACGG 

51 GGGCTGCCGG GAGTGGTCTC 

101 CGGCACTGGC GCACCATGTC 

151 GAGCAGCACG GGGTCCATGA 

201 CGGACTGTGT GCTGGCCCAC 

251 ACCGACGTCA CACTCTGGGC 

301 CGTGCTGGCC GCCTCTAGCC 

351 TTCGGGAGAG CCGGGATGAC 

401 GAGGTGCTGG AGCTGCTGCT 

451 CAACGAGGAG AACGCTGAGT 

501 TCCACGATGT GCGGGATGCT 

551 CCCTCCAACT GCCTGGGCAT 

601 CCGGCTGTAT GAGTTCTCCT 

651 TGAGGCAGAG CGAGGACTTC 

701 CTCATCTCGA GTGATGAGCT 

751 GGCCATCCTC CAGTGGGTGA 

801 TGCCCGAGCT CCTCCGCAGC 

851 CTGCAGGAGG CCGTCTCCAG 

901 CAAGCTTATC ATGGATGAGG 

951 ATGATGGCGT GGTCACCAGC 

1001 ACGCTACTCA TCCTGGGGGG 

1051 GGTGGACCAC AAGGCCAAGG 

1101 CCCGGAAGGA GTTCAGCGCC 

1151 GGGGGCAGGG GCTCCGAGAA 

1201 CACCGTACAT GAGGAATGGT 

1251 TTGGCCATGG CTCAGCTGAG 

1301 CACACATCCC TGGCAGGGGT 

1351 ACAAGTGGAG AAATACGACC 

1401 CCTTGCGGGA TGGCGTCAGC 

1451 CTCTTTGTTT TCGGAGGAAC 

1501 CCAGTGCTAT GACCCCTCGG 

1551 CCCAGCCTTG GCGGTACACA 

1601 ATCATGGGAG GTGACACGGA 

1651 CTGTGAGACC AACCAGTGGA 

1701 TGTCCTGCCA TGCCCTGGCT 

1751 TACTTTGGGA CCCAGAGGTG 

1801 AGATACATGG AACTGCATCA 

1851 CCTTTGTCAG CACCTGGAAG 

1901 CCCAGCCAGA CCGCGGCCTT 

1951 CACAGCGGGA GCTAAGCCGG 

2001 GGCCCTGCCA GCTCTGGGGA 

2051 GCAAGAGAAG AGAAGCATCT 

2101 GCTTTGCAGT GGTTTGTGGG 

2151 CCACCAGGAC TGACCCTGGC 

2201 AGATCACCTG TTTGGCAGGT 

2251 GGAGGCGCCC CGGGTGGGCT 

2301 CCCTCCTGGC CTGCCCTGCT 

2351 CTGGGCCTGG GAAACTAGGT 

2401 AGACAGATTT TTTAAGGTGC 

2451 ATGAGGCCTT ATTAGCAAAG 

2501 CTTCCACAAA GCTGTAAGTC 

2551 GCTGTGGCCC GGTGGGGACA 

2601 GCCTGCAGCA GACTCAAGGC 

2651 CCCCTCCTCA GAGCCCACCC 

2701 ACCTGCCAAC AGCACTGGGG 



GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 
TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 
GGTCAGTGTC CATGAGACCC GCAAGTCGCG 
ACGTCACCCT CTTCCACAAG GCCTCCCACC 
CTCAACACGC TTCGCAAGCA CTGCATGTTC 
GGGCGACCGT GCCTTCCCCT GTCACCGTGC 
GCTATTTTGA GGCCATGTTC AGCCATGGCC 
ACTGTCAACT TCCAGGACAA CCTGCACCCG 
GGACTTTGCC TACTCCTCAC GCATCGCCAT 
CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 
GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 
GATGCTGCTC TCGGACGCCC ACCAGTGCCG 
GGCGCATGTG CCTGGTGCAC TTTGAGACGG 
AACAGCCTGT CCAAGGACAC ACTGCTGGAC 
GGAGACCGAG GACGAGCGGG TGGTCTTCGA 
AGCACGACCT GGAGCCACGG AAGGTCCACT 
GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 
CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 
CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 
CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 
CCAGACCTTC ATGTGTGACA AGATCTACCA 
AGATCATCCC CAAGGCCGAC CTGCCCAGCC 
TCAGCGATCG GCTGCAAGGT CTATGTGACG 
CGGGGTCTCC AAGGATGTCT GGGTGTACGA 
CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 
CTGGAGAACT GCCTCTATGT GGTGGGGGGA 
CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 
CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 
AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 
CAGCATCCAC CGGGACATGG TGTCCAAGGT 
AGAACAGGTG GACGATCAAG GCCGAGTGCC 
GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 
ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 
CGCGGATTGG GGACATGACT GCCAAGCGCA 
TCCGGCAACA AGCTCTATGT GGTCGGGGGC 
TAAGACTCTG GACTGCTATG ACCCCACTTC 
CCACAGTGCC CTACTCACTT ATCCCCACGG 
CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 
CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 
CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 
GCAGCAGCCT TGGGCTGTTC TGAGCTTTAG 
CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 
AAGACATACC TCCCAGAGGG GCATGGACTG 
GTCGGGGAGA AGGACACTTG CAGAGCCTTG 
CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 
TTGGGGCTGC GGCACTGCCA CACATCCTTT 
GGGGCTCTAC TGCCATCTAT AGATGGTGTC 
TCCCAGGGGT TGAGACCAGA AAGGTGACCA 
AGAAACTGCA GGGGGGCCTC AGTGACATCC 
GACACCCAGA CCTCCAAGGT TTGTGGGCCC 
CCAGCCCACC TACTCAGGGC CTTGCTCAGT 
CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 
TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 
TGAGAGGCAG CAGTGACCCC CATGGCACAC 
GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 
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2751 AAGACCAGGA GCAGCTGTGA GCTGGAGACA GCAGAGGGAC CCCAGGGTGT 
2801 CCCCTGCAGA TCCCACCAGG GCCGCATCCA TCTCAGTGTG GAGGACAGTG 
2851 ACGGGACCCT CACCATCCTC TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 
2901 CTCCAAGATA AGAATGGCCC CGAGAGAACT GCTGAACATT TGTTCATTGC 
2951 TGTCACCTCC TGAGTCACTG GGGTCCCTCA CCAGCACCTC CCTGACACCT 
3001 GGGCTATGGA GAGGTTGGCG CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 
3051 TCACTCCCAA GCCACCATTT GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 
3101 GACTGGGCAG GGTGTCCAAA CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 
3151 ACACTGCTTG GCTGGCTCAA GATTAGGGCC GCGGAGGGGG CTGTGCACAT 
3201 ACCAGTTACT TAAGCAGCCA CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 
3251 TGGAGGCCTC TTGGGGGTGG GACCTTTGGG CAGGGTTTGC CCACTGACGC 
3301 GCCCGCCATG GGGCACTGGC TGCATGGGGC TCCTTGGACC CTGTAGAGCC 
3351 AGCAGGAGCC TGGCCGCGGG GACTGCAGGG AGGGTGCCTG GACCCGTGGG 
3401 GTTGCTTCAT TGAGATAAAG CACACTTATC ACATAGCACA AAGGACGTGC 
3451 CATGGTGCTT TCCCCAAAAG TTGTGTTGCT TTTATCAGTT TTCTAACTTA 
3501 ATAAAAAGAG TTGAGAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647: 

ENC-1: a novel mammalian kelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/B, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal dif f erentiati 



Peptide information for frame 2 



ORF from 116 bp to 1882 bp; peptide length: 589 
Category: strong similarity to known protein 
Classification: Cell structure/motility 



1 MSVSVHETRK SRSSTGSMNV TLFHKASHPD CVLAHLNTLR KHCMFTDVTL 
51 WAGDRAFPCH RAVLAASSRY FEAMFSHGLR ESRDDTVNFQ DNLHPEVLEL 
101 LLD FAYS SRI AINEENAESL LEAGDMLQFH DVRDAAAEFL EKNLFPSNCL 
151 GMMLLSDAHQ CRRLYEFSWR MCLVHFETVR QSEDFNSLSK DTLLDLISSD 
201 ELETEDERW FEAILQWVKH DLEPRKVHLP ELLRSVRLAL LPSDCLQEAV 
251 SSEALLMADE RTKLIMDEAL RCKTRILQND GVVTSPCARP RKAGHTLLIL 
301 GGQTFMCDKI YQVDHKAKEI IPKADLPSPR KEFSASAIGC KVYVTGGRGS 
351 ENGVSKDVWV YDTVHEEWSK AAPMLIARFG HGSAELENCL YVVGGHTSLA 
401 GVFPASPSVS LKQVEKYDPG ANKWMMVAPL RDGVSNAAVV SAKLKLFVFG 
4 51 GTSIHRDMVS KVQCYDPSEN RWTIKAECPQ PWRYTAAAVL GSQIFIMGGD 
501 TEFTAASAYR FDCETNQWTR IGDMTAKRMS CHALASGNKL YWGGYFGTQ 
551 RCKTLDCYDP TSDTWNCITT VPYSLIPTAF VSTWKHLPA 

BLASTP hits 

Entry MMU65079_1 from database TREMBL: 

gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds. 

Score = 2402, P = 1.9e-249, identities « 440/589, positives « 513/589 
Entry AF059611_1 from database TREMBLNEW: 

gene: "NRPB"; product: "nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score = 2400, P - 3.0e-249, identities = 440/589, positives « 512/589 

Entry AF010314_1 from database TREMBL: 

gene: "PIG10"; product: "PiglO"; Homo sapiens PiglO (PIG10) mRNA, 
complete cds. 

Score = 1745, P - 7.8e-180, identities - 335/507, positives = 403/507 
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Entry KELC_DROME from database SWISSPROT: 

RING CANAL PROTEIN { KELCH PROTEIN). >TREMBL : DMRCPA_1 product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and ORF2 
mRNA, complete cds. 

Score « 672, P = 3.9e-66, identities = 168/536, positives = 257/536 



Alert BLAST P hits for DKFZphtes3_lkll, frame 2 
No Alert BLAST P hits found 

Pedant information for DKFZphtes3_lkll, frame 2 

Report for DKFZphtes3_lkll .2 

[LENGTH] 589 

[MWJ 65923.45 

[pi] 6.10 

[HOMOLJ TREMBL:MMU6507 9_1 gene: "ENC-l"; product: "actin-binding protein"; Mus musculus 

actin-binding protein (ENC-1) mRNA, complete cds. 0.0 

[FUNCAT] 10.05.99 other pheromone response activities {S. cerevisiae, YHRl58c] 

2e-09 

[BLOCKS] BL01016D Glycoprotease family proteins 

[PIRKW] zinc finger le-08 

[PIRKW] DNA binding le-08 

[PIRKW] transcription factor le-08 

[SUPFAM] POZ domain homology 3e-68 

[SUPFAM] vaccinia virus 59K Hindlll-C protein le-15 

[SUPFAM] A55R protein 5e-29 

[SUPFAM] hypothetical protein YHR158c 4e-08 

[SUPFAM] A55R protein middle region homology 5e-29 

[SUPFAM] myxoma virus M9-R protein le-14 

[SUPFAM] A55R protein carboxyl-terminal homology 5e-29 

[KW] Alpha_Beta 

SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 
PRD cccccccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RAVLAASSRYFEAMFSHGLRESRDDTVNFQDNLHPEVLELLLDFAYSSRIAINEENAESL 
PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFHDVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 
PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh . 

SEQ QSEDFNSLSKDTLLDLISSDELETEDERWFEAILQWVKHDLEPRKVHLPELLRSVRLAL 
PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIMDEALRCKTRI LQNDGWTSPCARPRKAGHTLLIL 
PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQTFMCDKIYQVDHKAKEIIPKADLPSPRKEFSASAIGCKVYVTGGRGSENGVSKDVWV 
PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPMLIARFGHGSAELENCLYWGGHTSLAGVFPASPSVSLKQVEKYDPG 
PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

SEQ AN KWMM VA P L RDGVS N AA WS AKL KL FV FGGT S I H RDMVS K VQC YDPSENRWTI KAEC PQ 
PRD ccceeeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ PWRYTAAAVLGSQIFIMGGDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 
PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLIPTAFVSTWKHLPA 
PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc 

(No Prosite data available for DKFZphtes3_lkll . 2) 
(No Pfam data available for DKFZphtes3_lkll .2) 
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DKFZphtes3_ln3 



group: signal transduction 

DKFZphtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein. 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a RGD site is present. 

The new protein can find application in modulating/blocking G-protein-dependent pathways, 
similarity to Tuplp 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map='*6q24" 
Insert length: 5277 bp 

Poly A stretch at pos. 5267, polyadenylation signal at pos. 5244 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



GCTGCATAAA 
AACCAAAGTT 
GTGAAAAGAA 
TCACCTGACA 
TGATGATCCC 
CAAGTGATGA 
AGAGTCACTA 
TAATGGTGAT 
AGGTGATAAA 
ACTCCTGAGA 
ACAGCCAGGC 
AAGAGACTGA 
CATGTAACTG 
ACTGAAAGAA 
ACAAACTAAG 
TCTAAAGCTG 
TGAACAAAAG 
AAGATGATGA 
GATGATACAA 
TGCAGATAAT 
GCCGAGATAG 
TTGGGAGTTT 
TTCTCACCCA 
ATGTCAAGAA 
GAGAATGTGG 
ACAGTTAAAA 
AAAATTTTCC 
CTGTTCTTTG 
TTCTGAGGTT 
TTCTTAAGCT 
CGCTTGCAGC 
TGTTGAGGCA 
CAACACTGTA 
CCATCTTACC 
GCATTGTGAA 
TAGAAGAGTC 
TGCCGTATCC 
ATGTTTTTGT 
GTGCCAGCCG 
CGTTTCATGA 
CTGGTCAAAA 
CCAGGATATG 
CCTCATCCTT 
GCTAGTAGTT 
AGATGAGAGA 
AGTTTTATCA 
AGGAGATTGT 
ATGATTTGGA 
GAAACTGAGT 
TGGAAAACGT 
ATCTCCGGAT 
GAGAAGATTC 
AAGTGAGGAT 



GCTGAGAGAT 
CGCTTTGAAA 
AAAACTGAAG 
CTATTAGAAG 
GACACTATTA 
TGTAAGTGCT 
AAAACAAATT 
GCTAGTGTAG 
GACGGTGCCC 
ATAAGGTTGA 
GTTGATCATC 
TTTAGAAGAG 
AAGAAATGGC 
CAGTTGACTT 
CAGTGAAAAA 
AAACAAGTAC 
AAAGAATCTT 
AATAAGCTCA 
AACCTAAACC 
AATGAAGATG 
CCCGGTTTAT 
ACATTCACCG 
ATGGTAAAAA 
AGATGATAGT 
ATTATATTCT 
TCAAGACTTC 
CTATTTGCTT 
AGATTCTTGA 
CAAAACCAAG 
TCTGGGAGCC 
TATATTACCC 
TTTGAATGGT 
CGTAACTGTA 
GCTCTATGAT 
CGTCACCATG 
AAAGGAAGTA 
CAAACAAACA 
CTTGATTTCT 
GGATGGATAT 
GAGAATTGTG 
GATGATCACT 
GAAAAATGAA 
CTTTTGTTTA 
ACAGGATGCT 
AGATTCTGCC 
ACTCACTTTG 
ACAGGGGTGA 
ACATTCAGTG 
TTAAGGGAAT 
TTGTTAATCC 
ATTAGTAGCA 
ATAGTACTTT 
GGTATAGTGT 



GCCTACAGCT 
AATTGCTTAA 
AAAAAACTTG 
CAATCTTCAC 
GAAGCAATCT 
GCTAACACTA 
GAGGAACACA 
AGGAAGACAA 
CAGTTGACTA 
TTCTACACAC 
AGAAAAGTGA 
GATGAAGAAT 
AAAGGAGATT 
ACTTTCCCTC 
AGGAAAAAGA 
ATTGACCATC 
CAGTTAGATC 
ATGGAACAAA 
AAAAAAAACA 
TTGATGGTGA 
CCCAAATGTT 
AACTGATAGA 
TTCATGTGGT 
GGACGGCCTG 
TCCTATTATG 
CAGAGTGGGA 
CGAGGCTCTG 
TTTCTTAAGC 
AATGTGGCTT 
AATGGAAATG 
ACCTACTAAG 
GGTCAAAATG 
AGAGGACTGA 
GGCTCTTCAG 
AGTCAAGCTC 
ATAAAGTGGA 
CCTCTTCTCA 
CCCACAATGG 
CCAATTATTT 
TGGCCACCTC 
ACATCCTTAC 
ATAAACAATA 
CACGGCTAAA 
ATGATTCCAT 
ATATTGGTCC 
TTTTGATACT 
TTGTTGTTTG 
CACCACTGGA 
TCCAATAAGT 
ATACCAAAGA 
AGGAAGTTTG 
GACTCCATGT 
ATGTTTGGAA 



GAGAGTGAAG 
GACCCACAGT 
TCAGGTCTGA 
TATATGAAAG 
TCCCCATATT 
ACAACCTGAA 
CAGTTAGCAA 
ACAAGGAAAG 
CACAAGACCT 
CAGAAAACAC 
GAAGGCAAAT 
TGATGCAAGC 
AAGAGGAAAA 
AGATACTTTA 
AAAAGGAAGT 
TCTGGTGACA 
AGTTTCTTCA 
GCACAGAAGA 
AAAAAGAAGA 
TGGTGTTCAT 
TGCTTGATGA 
CTTAAGTCAG 
TGATGAGCAT 
TTTCATCTTA 
ACCCAGCCAT 
AGAACAAATT 
ATGAGAGTCC 
GTGGATGAAA 
TCGGAAAATT 
CAAACATCAA 
CCTCGATCCC 
TCCAAGAAAT 
AAGTTCCAGA 
GAGGAAAAAG 
AGTAGACACA 
AACGACTCCC 
CTAAATGCAG 
AAGAATATTA 
TATATGAAAT 
AATATCATTT 
TTCATCATCT 
CAAATACTTT 
TTCCATCCAG 
GATACGGATA 
GACAGTTTGA 
GAAGGTCATC 
GAATACCTAT 
CTATAAATAA 
TATTTGGAGA 
CAGTACTTTG 
TAGGAGCAGC 
GGGACTTTTC 
CCCAGAAACA 



CAAAAGTAAA 
GATCTAATGC 
AGAAAACATC 
AAACTACAAG 
AAAGAAACTA 
GAAGAGCACG 
CTGAAAATCC 
CCAAATAAAA 
GAAACCGGAA 
ATACAAAGCC 
GAGGGAAGAG 
ATATCAGTGC 
TAAGAAAGAA 
TTCCATGATG 
TCCAGTCTTC 
CAGTTGAAGG 
GATTCTCATC 
CAGCATGCAA 
CTAAAGCAGT 
GAAATAACAA 
TGACCTTGTC 
ATTTTATGAT 
ACTGGTCAAT 
CTATGAAAAA 
ATGATTTTAA 
GTATTTAATG 
TAAAGTCATC 
TTAAGAATAA 
GCCTGGGCAT 
CTCAAAACTT 
CATTAAGTGT 
CATTACCCAT 
CTGTATAAAG 
GTAAACCAGT 
GAACCTGGAT 
TGGGCAGGCT 
GAGAACGAGG 
GCAGCAGCTT 
TCCTTCTGGA 
ATGATCTTTC 
GATGGCACTG 
CAGAGTTTTA 
CTGTAAGAGA 
TGGAAAGTTG 
TGTTCACAAA 
ATATGTATTC 
GTCAAGATTA 
GGAAATTAAA 
TTCATCCCAA 
AGAATTATGG 
AAATTATCGG 
TGTTTGCTGG 
GGAGAACAAG 
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2651 TAGCCATGTA TTCTGACTTG 
2701 TATCATCCAT TTGAAAATAT 
2751 GCCAATTCTT CTGTATATTT 
2801 AAATGTTCAA ACGCTACAAT 
2851 AGTCAAGATG CCCTATGTAC 
2901 TCAGATTGAT GAATTTGTCC 
2951 TAGTAAAACA GAGGCTTGAA 
3001 GCAAAAGTCA ACAAAAATCT 
3051 ACAACAGTCT AAGTTAAAGC 
3101 TACATCAGTT TGGTTTCACT 
3151 CCTTGTAACC ATCAGGTAGA 
3201 CTACACAGCG AATCGATCAG 
3251 TCCGAGTGTT TTTCAAAGAT 
3301 AAGGGACAGG AAGGTTATTT 
3351 GTATCAAGAA CTGCCTCCTG 
3401 CTGAGGAAAA AACTAAAATA 
34 51 ATCAATAAGA ACAAGTCCCA 
3501 ACATTCTGAA ATGAGAAAAG 
3551 TGGATACACG GATGAGGAAG 
3601 ATAGAGTAAA GAATTGAAGA 
3651 TGAAAATGAC AAACCAAATG 
3701 ATACTAAGGA GGAAGAAAGG 
3751 CTAGAAAAAT CAGAATCAAG 
3801 AGTTCAGTTG TTATAAACCA 
3851 CTTATATTGT TAGTAATTGC 
3901 TAATGAAGAA AACACTGTAA 
3951 TTTGCGTTAT TAGGATGTCT 
4001 GTAATGGTTG TATGTGTGAT 
4051 TGAAATTCAC TTTATTTAAA 
4101 ATGCCTGTAA TCCTAGCACC 
4151 GGTCAGGAGT TCAACAACAC 
4201 CTACTAAAAA TACAAAAATC 
4251 CCATCTACTG AGGCAGGAGA 
4301 TGAGCCAAGA TCACGCCATT 
4351 CTATCTCCAA AAAACAAAAA 
4401 AAAACAGTCT CAGTAACAAA 
44 51 TTAAAATTTT GTGTTTCTTA 
4501 CTAAATTAAA AGTGATTTTT 
4551 TTTACTTTTT AAAAAAGGCT 
4601 GACTTACATT GTTTAATATT 
4651 TATTGCATTA TTTATTTTTT 
4701 TAGACTATAT GTTTTGAAGT 
4751 TTCTTTTCTT GACTCCTTTC 
4801 TATTTAACCC CAAGAAAGTG 
4851 CAAATTAGAC AGTCAATTCC 
4901 GGGCATTGAG GTGTAAATTT 
4951 TTCTAATAAA AATATATTTG 
5001 AAATTGGTAA ACATGTCCCG 
5051 TGAAAGTGTT GAGTGGCACT 
5101 CAAAAGTCTG TTCTGATGGC 
5151 TCTGTGTGTC AGGTACAGCT 
5201 AAGCTTGTTT TTTTCTGTCT 
5251 AATATCTGTT TCTCTGCAAA 



CCATTCAAGT CACCCATTCG AGACATTTCT 
GGTTGCATTC TGTGCATTTG GGCAAAATGA 
ACGATTTCCA TGTTGCCCAG CAGGAGGCTG 
GGAACATTTC CATTACCTGG AATACACCAA 
CTGTCCAAAA CTACCCCATC AAGGCTCTTT 
ACACTGAAAG TTCTTCAACG AAGATGCAGC 
ACTGTCACAG AGGTGATACG TTCCTGTGCT 
CTCATTTACT TCACCACCAG CAGTTTCCTC 
AGTCAAACAT GCTGACCGCT CAAGAGATTC 
CAGACCGGGA TTATCAGCAT AGAAAGAAAG 
TACAGCACCA ACGGTAGTGG CTCTTTATGA 
ATGAACTAAC CATCCATCGC GGAGACATTA 
AATGAAGACT GGTGGTATGG CAGCATAGGA 
TCCAGCTAAT CATGTGGCTA GTGAAACACT 
AGATAAAGGA GCGATCCCCT CCTTTAAGCC 
GAAAAATCTC CAGCTCCTCA AAAGCAATCA 
GGACTTCAGA CTAGGCTCAG AATCTATGAC 
AACAGAGCCA TGAGGACCAA GGACACATAA 
AACAAGCAAG CAGGCAGAAA AGTCACTCTA 
AAAGTTAAGA GCTGCCGAAA TGCACAGAGG 
GAATTTCTCT TCAGAGTTCA GAATTTTCAG 
AT CC AC TACT TCTTGTTCTT ATGAATGACT 
TTGTGGGTGG AAAAATCAAC GTGGCCTTTG 
TTGTGACTAT TGTTGGTCAA AGTATTGGTA 
ATCATAATTA CATTACCAGT GTTGGAAAAC 
TTGCTACTCA GCAAATGTGA ATAAAAGGTG 
GTTAAGTAAT CATTTAATAT TATTATATTG 
GCTATGCCCA GAATATGAAG TATCTGTTTT 
AGATAAGCAG CTGACTGGGC ACGGTGCCTC 
TTGGGAGGCT GAGGCAGGTG GATCACCTAA 
CAGCCTGACC AACATGGTGA AACCCCATCT 
AGCCGGGTCT CATGGCAGGC ACCTGTAATC 
ATTGCTTGAC CCAGGAGGCA GAGGTTGCAG 
GCACTCCAGC CTGGGGGACA GAGCAAGACT 
AGATAAGCAG CTTTAGAATA TGGCGCATTC 
GACATTAAAA GAAAACAATT TACTTTCTAA 
AGATCAAATC ATATAGGTAA CTTCATAGAC 
GGCTGGACTG GCAACAATGT TCCCAATGTC 
TTTCATATTT AAGCACATAC CTATTTTGTA 
TATTTTAATC TTAATATTTT TACATTATTA 
CTAAGTTCCA GAATAATAGT GTCATTATTA 
TTGATATTAT AATGGGATAT TCATTTTTTG 
TCAAGTGTGT GATAAGGTCT GCTGATAAAA 
AAAACTAATA TAAAATTAGA AAGACCTATC 
ATTAAAATAA GAAGTGAGAA AAACAATGTT 
TGCCCAGATG TATACCCAGT GTGAAATATC 
GCTCTTATCC CTGCACATGT AGAGGCATAA 
CTGTGTAGAA CTTTAAAAAA AAGGCATTTT 
GATAACTGGT GAAGCCTACA GCCATCCGCC 
ACTGAGTTTT CATTGTTCTG GATGTATAAG 
GGGCCCAGCC AGCTTGAGTC ACTCTTGTAC 
TGTGAATGCA CTTGATAATT TAAAAATAAA 
AAAAAAA 



BLAST Results 



Entry HS32B1 from database EMBL: 

Human DNA sequence *+* SEQUENCING IN PROGRESS *** from clone 32B1 
Score = 4445, P « 0.0e+00, identities = 889/889 

Entry U93816 from database EMBL: 

Human exon-trapped sequence from 6q24 . 

Score = 965, P =» 4.0e-35, identities = 193/193 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 MPTAESEAKV KTKVRFEKLL KTHSDLMREK KKLKKKLVRS EENISPDTIR 

51 SNLHYMKETT SDDPDTIRSN LPHIKETTSD DVSAANTNNL KKSTRVTKNK 

101 LRNTQLATEN PNGDASVEED KQGKPNKKVI KTVPQLTTQD LKPETPENKV 

151 DSTHQKTHTK PQPGVDHQKS EKANEGREET DLEEDEELMQ AYQCHVTEEM 

201 AKEIKRKIRK KLKEQLTYFP SDTLFHDDKL SSEKRKKKKE VPVFSKAETS 

251 TLTISGDTVE GEQKKESSVR SVSSDSHQDD EISSMEQSTE DSMQDDTKPK 

301 PKKTKKKTKA VADNNEDVDG DGVHEITSRD SPVYPKCLLD DDLVLGVYIH 

351 RTDRLKSDFM ISHPMVKIHV VDEHTGQYVK KDDSGRPVSS YYEKENVDYI 

401 LPIMTQPYDF KQLKSRLPEW EEQIVFNENF PYLLRGSDES PKVILFFEIL 

451 DFLSVDEIKN NSEVQNQECG FRKIAWAFLK LLGANGNANI NSKLRLQLYY 

501 PPTKPRSPLS VVEAFEWWSK CPRNHYPSTL YVTVRGLKVP DCIKPSYRSM 

551 MALQEEKGKP VHCERHHESS SVDTEPGLEE SKEVIKWKRL PGQACRIPNK 

601 HLFSLNAGER GCFCLDFSHN GRI LAAACAS RDGYPIILYE IPSGRFMREL 

651 CGHLNIIYDL SWSKDDHYIL TSSSDGTARI WKNEINNTNT FRVLPHPSFV 

701 YTAKFHPAVR ELWTGCYDS MIRIWKVEMR EDSAILVRQF DVHKSFINSL 

751 CFDTEGHHMY SGDCTGVIVV WNTYVKINDL EHSVHHWTIN KEIKETEFKG 

801 IPISYLEIHP NGKRLLIHTK DSTLRIMDLR ILVARKFVGA ANYREKIHST 

851 LTPCGTFLFA GSEDGIVYVW NPETGEQVAM YSDLPFKSPI RDISYHPFEN 

901 MVAFCAFGQN EPILLYIYDF HVAQQEAEMF KRYNGTFPLP GIHQSQDALC 

951 TCPKLPHQGS FQIDEFVHTE SSSTKMQLVK QRLETVTEVI RSCAAKVNKN 

1001 LSFTSPPAVS SQQSKLKQSN MLTAQEILHQ FGFTQTGIIS IERKPCNHQV 

1051 DTAPTVVALY DYTANRSDEL TIHRGDIIRV FFKDNEDWWY GSIGKGQEGY 

1101 FPANHVASET LYQELPPEIK ERSPPLSPEE KTKIEKSPAP QKQSINKNKS 

1151 QDFRLGSESM THSEMRKEQS HEDQGHIMDT RMRKNKQAGR KVTLIE 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_ln3, frame 1 

TREMBL :U92792_1 gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
pombe general transcriptional repressor Tupl (tupl) mRNA, complete 
cds., N = 1, Score = 186, P - le-10 

TREMBL: AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N = 1, Score = 235, P = 4.6e-18 

TREMBL: SPAC3H5_8 gene: "SPAC3H5 . 08c"; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5., N = 2, Score - 231, P ■ 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N = 
2, Score - 228, P - le-13 

TREMBL :AF104 2 581 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N - 1, Score = 235, P = 4.6e-18 

TREMBL : SPAC3H5_8 gene: "SPAC3H5 . 08c" ; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5., N => 2, Score = 231, P «* 2e-14 

TREMBL :CER03E1_1 gene: "R03El.l n ; Caenorhabditis elegans cosmid R03E1, 
N = 1, Score = 215, P - 2.3e-13 

SWISSPROT:YZLL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN K04G11.4 IN CHROMOSOME X., N - 1, Score = 203, P = 7. le-13 



>TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733) mRNA, complete cds. 
Length = 321 

HSPs: 

Score = 235 (35.3 bits), Expect - 4.6e-18, P - 4.6e-18 
Identities = 59/225 (26%), Positives - 111/225 (49%) 

Query: 647 MRELCGHLNIIYDLSWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFH 706 

+ E GH + I DLSWSK+ +L++S D T R+W ++ + +V H ++V +F+ 
Sbjct: 63 VHEFYGHGDAILDLSWSKNGD-LLSASMDKTVRLW— QVGRDSCLKVFSHTNYVTCVQFN 119 

Query: 707 PAVRELWTGCYDSMIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTG 7 66 

P +TGC D ++RIW V LV + K + ++C+ +G +G TG 

Sbjct: 120 PTNGNYFITGCIDGLVRIWDVRK CLVVDWANSKEIVTAVCYRPDGKGAVAGTITG 174 

Query: 767 VIWWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRI 826 
++ +LE V ++N K + + Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV SLNGRKKSLHKRIVGFQYCPSDP — KKLMVTSGDAQVRI 229 

Query: 827 MDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGIVYVWN 871 

+D +++ +G+ ++ + TP G + + S+D +Y+WN 
Sbjct: 230 LDGAHVISN-YKGLQS-SSQVARSFTPDGDHIVSASDDSRIYMWN 272 

Pedant information for DKFZphtes3_ln3, frame 1 



Report for DKFZphtes3_ln3. 1 

[LENGTH] 1196 

|MW) 137114.70 

[pi] 6.79 

[HOMOL] SWISSPROT:YKY4_CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 
C14B1.4 IN CHROMOSOME III. 8e-21 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YKLl21w] 2e-ll 

[FUNCAT] 04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

TAF90 - TFIID subunit] 4e-10 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YBR198c TAF90 - TFIID subunit] 
4e-10 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPR178w] le-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YPRl78w] le-08 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YDR364c] 4e-08 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YDR364c] 4e-08 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDLl45c] 

9e-08 

[ FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YDL145C] 9e-08 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YCR084c) 2e-07 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YHL002w] 7e-07 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 

[FUNCAT] 02.16 fermentation [S. cerevisiae, YMR116c] 4e-06 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YMR116c] 4e-06 

[FUNCAT] 05.04 translation (initiation, elongation and termination) (S. cerevisiae, 
YMR116C] 4e-06 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YFL009w] 4e-05 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 

4e-05 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] 4e-05 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YCR088w] 6e-05 

[FUNCAT] 03.25 cytokinesis [S. cerevisiae, YCR057c] 7e-05 

[BLOCKS] BL00024H 

[SCOP] dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-91 

[SCOP] dlgfc 2.21.2.1.9 Growth factor receptor-bound protein 2 (GRB2), N 4e-14 

[SCOP] dlfmk_l 2.21.2.1.8 (1-64) c-src tyrosine kinase [human (Horn 5e-15 

[SCOP] dladSbl 2.21.2.1.7 (1-63) Hemapoetic cell kinase Hck [human (Horn 3e-15 

[SCOP] dllckal 2.21.2.1.16 (1-54) p56-lck tyrosine kinase, SH3 domain [huma le-13 

[SCOP] dlqwea_ 2.21.2.1.15 Src kinase, SH3 domain [Avian sarcoma virus 2e-15 

[SCOP] dlshg 2.21.2.1.6 alpha-Spectrin, SH3 domain [chicken (Gallu 2e-13 

[SCOP] dlprmc_ 2.21.2.1.13 Src kinase, SH3 domain (chicken (Gallus gallus) 2e-15 

[SCOP] dlhsq 2.21.2.1.12 Phospholipase C, SH3 domain [human (Horn 2e-13 

[SCOP] dlaboa_ 2.21.2.1.3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 

[SCOP] dlefna_ 2.21.2.1.2 Fyn, SH3 domain [human (Homo sapiens) 2e-15 

[SCOP] dlsema" 2.21.2.1.11 Growth factor receptor-bound protein 2 (GRB2), N le-13 

[SCOP] dlgbqa_ 2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2 ) , N 3e-16 

[SCOP] dlckaa_ 2.21.2.1.1 C-Crk, N-terrainal SH3 domain [mouse (Mu 3e-15 

[EC] 3.1.4.3 Phospholipase C 2e-07 

[EC] 3.1.4.11 l-Phosphatidylinositol-4, 5-bisphosphate phosphodiesterase 7e-07 

[EC] 3.6.1.32 Myosin ATPase 7e-07 

[EC] 2.7.1.112 Protein-tyrosine kinase Be-06 

[PIRKW] nucleus 2e-08 

[PIRKW] phosphotransferase 8e-06 

[PIRKW] plasma 4e-07 

[PIRKW] duplication 4e-07 

[PIRKW] phosphoric diester hydrolase 2e-07 

[PIRKW] tandem repeat 7e-07 

[PIRKW] hormone 4e-07 

[PIRKW] transmembrane protein 2e-06 

(PIRKW] stomach 4e-07 

(PIRKW] actin binding 7e-07 

(PIRKW] ATP 7e-07 

[PIRKW] phosphoprotein 7e-07 

[PIRKW] signal transduction 7e-09 

[PIRKW] heterotrimer 7e-09 

[PIRKW] P-loop 7e-07 

[PIRKW] hydrolase 7e-07 

[PIRKW] transcription regulation 5e-06 

[PIRKW] GTP binding 7e-09 
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(SUPFAMJ l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase II 2e-07 

[SUPFAM] SH3 homology 2e-07 

[SUPFAM] SH2 homology 2e-07 

(SUPFAMJ protozoan myosin heavy chain IB 7e-07 

[SUPFAM] myosin motor domain homology 7e-07 

[SUPFAM] pleckstrin repeat homology 2e-07 

[SUPFAM] protein-tyrosine kinase src 8e-06 

(SUPFAM] WD repeat homology 3e-12 

[SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain Y homology 2e- 
07 

(SUPFAM] protein kinase homology 8e-06 

[SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain X homology 2e- 
07 

[SUPFAM] GTP-binding regulatory protein beta chain 7e-09 

[SUPFAM] yeast coatomer complex alpha chain 4e-07 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL 6 

( PROSITE] AMI DAT ION 2 

[PROSITE] CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PHOSPHO SITE 25 

[PROSITE] TYR_PHOSPHO~SITE 4 

[PROSITE] PKC_PHOSPHO~SITE 19 

[PROSITE] ASN_G LYCOS YLATI ON 6 

[PFAMJ Src homology domain 3 

[ PFAM) WD domain, G-beta repeats 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 5.77 % 

[KW] COILED COIL 2.42 % 



SEQ MPTAESEAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENISPDTIRSNLHYMKETT 

SEG xxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

IgotB 

SEQ SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 

SEG 

COILS 

IgotB 

SEQ KQGKPNKKVIKTVPQLTTQDLKPETPENKVDSTHQKTHTKPQPGVDHQKSEKANEGREET 

SEG xxx 

COILS 

IgotB 



SEQ DLEEDEELMQAYQCHVTEEMAKEIKRKIRKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 

SEG xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 

COILS 

IgotB 



SEQ VPVFSKAETSTLTISGDTVEGEQKKESSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 

SEG xxxxxxxxxx xxxx 

COILS 

IgotB 



SEQ PKKTKKKTKAVADNNEDVDGDGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 

SEG xxxxxxxxx 

COILS 

IgotB 

SEQ ISHPMVKIHWDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 

SEG 

COILS 

IgotB 

SEQ EEQIVFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNNSEVQNQECGFRKIAWAFLK 

SEG 

COILS 

IgotB 

SEQ LLGANGNANINSKLRLQLYYPPTKPRSPLSVVEAFEWWSKCPRNHYPSTLYVTVRGLKVP 

SEG . . .' 

COILS 

IgotB 

SEQ DCIKPSYRSMMALQEEKGKPVHCERHHESSSVDTEPGLEESKEVIKWKRLPGQACRIPNK 

SEG 

COILS 

IgotB 
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SEQ HLFSLNAGERGCFCLDFSHNGRILAAACASRDGYPIILYEI PSGRFMRELCGHLNIIYDL 

SEG 

COILS 



SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELVVTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT — TTCEEEEEETTTCEEEEEETTT-TCEEEEEETTT 

SEQ MIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVIWWNTYVKINDL 

SEG 

COILS 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEEEEEE 

SEQ EHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGIVYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

SEQ MVAFCAFGQNEPILLYI YDFHVAQQEAEMFKRYNGTFPLPGIHQSQDALCTCPKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESS ST KMQLV KQRLET VT E V I RS C AAK VN KNLSFTSPPAVS SQQS KLKQ S N 

SEG 

COILS 

IgotB 

SEQ MLT AQE I LH QFG FTQTG I ISIERKPCNHQV DT A PT VV AL Y D YT ANRS DELT I H RG D 1 1 RV 

SEG 

COILS 

IgotB 

SEQ FFKDNEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKIEKSPAP 

SEG 

COILS 

IgotB 

SEQ QKQSINKNKSQDFRLGSESMTHSEMRKEQSHEDQGHIMDTRMRKNKQAGRKVTLIE 

SEG 

COILS 

IgotB 



IgotB 



CEEEEEECCCCCEEEE 



Prosite for DKFZphtes3_ln3 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS0O001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00O05 



1000->1004 
1065->1069 
1148->1152 
91->95 



1190->1194 



170->173 
232->235 
268->271 
3O4->307 
327->330 
352->355 
384->387 
440->443 
533->536 
546->549 
643->646 
677->680 
690->693 
702->705 



264->268 
305->309 



460->464 
686->690 
934->938 



48->51 
66->69 
93->96 



ASN GLYCOSYLATION 

ASNJ3LYCOSYLATION 

ASN_GLYCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASN GLYCOSYLATION 

ASN~GLYCOS YLAT I ON 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_S I TE 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC~PHOSPHO~SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00O01 
PDOC00O04 
PDOC00O04 
PDOC00O04 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
P500006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00016 



823->826 
973->976 
22->26 
59->63 
77->81 
116->120 
137->141 
180->184 
245->249 
276->280 
283->287 
288->292 
292->296 
327->331 
390->394 
454->458 
510->514 
570->574 
663->667 
672->676 
8O4->808 
985->989 
1023->1027 
1127->1131 
1132->1136 
1161->1165 
1170->1174 
1083->1091 
211->219 
1083->1091 
210->219 
483->489 
577->583 
716->722 
800->806 
661->867 
941->947 
811->815 
1188->1192 
1074->1077 



PKC_PHOSPHO 

PKC PHOSPHO" 

CK2~PHOSPHO~ 

CK2~PHOSPHO~ 

CK2 PHOSPHO" 

CK2~PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2 PHOSPHO" 

CK2~PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO^ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2 PHOSPHO" 

CK2~PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO" 

CK2 PHOSPHO" 

CK2~PHOSPHO" 

CK2_PHOSPHO~ 

TYR_PHOSPHO" 

TYR_PHOSPHO" 

TYR_PHOSPHO" 

TYR PHOSPHO" 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

AMI DAT I ON 

RGD 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00O06 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00O06 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00016 



Pfam for DKFZphtes3_ln3 . 1 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFS PDGrWFI vSGSWDgTCRLWD* 

+ GH+N ++++++S D ++ I+++S DGT R+W 
Query 650 LCGHLNIIYDLSWSKDDHY-ILTSSSDGTARIWK 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



Src homology domain 3 

*pyVIALYDYqAqdpDELSFkEGDIIiIIEdsDD . WWrgRnnnTNGQEGW 
P+V+ALYDY+A+++DEL++ +GDII + +t++ WW+G GQEG+ 
1054 PTWALYDYTANRSDELTIHRGDIIRVFFKDNEDWWYGSIGK--GQEGY 



1100 



IPSNYVEPi* 
+P+N V+ + 
1101 FPANHVASE 



1109 
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DKFZphtes3_20c21 
group: testes derived 

DKFZphtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /map-"22qll. 2-12.2" 
Insert length: 3997 bp 

Poly A stretch at pos. 3877, polyadenylation signal at pos. 3853 

1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 

101 AGTCCCGAAC CCCGAGCACT GGGATGCCTG GCTACTCCGA GCCAAGGCAC 

151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 

201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 

251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 

301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 

351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 

401 AAATATACAT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 

451 ATGCTGAGTG AAAACTAACC AAGTGGCAGT TGTGACTGAA AACACTGAAA 

501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 

551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 

601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 

651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 

701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 

751 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 

801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 

851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 

901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 

951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
1251 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGATGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
1751 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC CTTCCTACAT GTGCCTGTTC 
1801 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CTCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTG CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG CAGTTCAGGC 
2151 TCAGCAAACT GTCAGGGTGC TGGCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2401 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
2451 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGCAAAGCA GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
2751 CCCAGGAGGT GACTGGGAAG GAGAAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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2851 TTTAGGAACC AAATGATATT TGAGTTTTTG TTATTCCTTT 

2901 GATGTGTTTT GGGGGCAGGG GTTAGTTCTT CAGGTCGGCA 

2951 ACTTGATAAA GAACTGTATT TAATCGGTAG TGTTGGGGCC 

3001 TGGCTCCCTC TCTGCCATAC TGAGCCTGAG GTATTTCATA 

3051 TTCCATCCCA GCTTGAATTG GTGCCACAAG CTTCCAAGTT 

3101 CTAGAACCTG ATCGTCCACT AGCCCAGAGT GTGTGTGTTC 

3151 CCAGGTGGTG GTAGGCGGTG TGACTGCACA GCGAGGTGCC 

3201 GCAGGCCGAC TCCACTCCCA CGCCGCAGGT AGGTTTCTCC 

3251 TGCTGGGAGG TCCGGATCGT TCCTGCAGGG AAGCGGCAGC 

3301 CACTTGGTTG AATTCTGTTG GAACTCTACT CAAATCTAGG 

3351 TTGGACCCAC AATGGGGGCA AGCCTTAATA ATATGGAAGG 

3401 TTTAGAGATC CCTTTATAAA AGCTCTGGGG GCTGAGCCCT 

3451 TGACAACAGG ACCAACCTGC GCTGCCTTTG ACTACAAGTG 

3501 CTGGTTCCTC TCGAGCGAGT GTCCCTAAAT AGGAGTTTAC 

3551 GGGGTAAAAG CACTGTGCTT TTCAGTGGTG GCTGCGTGAA 

3601 ACTCAGCTGT GTGTTCCTGG GCTTGTGTGG TACTTAGAAC 

3651 TTACGTTATA GTCAGACATT TTTTTGACAG TATGAGACAG 

3701 GAAAATATTT GTCAAAATCT TAACTGAATG TTTACTGGAA 

3751 TTCCATTTGA GAGTTGTATT GTTAATAATT TCATGTCAGT 

3801 CTGATGTTTA TGATATGGTG TCTTTTTCTT GAAACAAGCT 

3851 AGAAATAAAA TAGCCAAAAA ATGCTGGAAA AAAAAAAAAA 

3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry HS1048E9 from database EMBLNEW : 
Human DNA sequence from clone 1048E9 on chromosome 22qll,2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score « 6540, P « 0.0e+00, identities « 1308/1308 
-14 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 

1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 
51 ELLCGQIAGV VRCVSDISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 
101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 
151 SDLHKIFNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 
201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 
251 TKEEAISLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 
301 VESMAWTTPD PTSPDEACPD GRKENGCLSG HDLESIRPAG LHNSARGEVL 
351 GLSSSLGKEL VFLQEELDLS EIHIPEAQEV EMASGHFAFL HVPVPDGRAP 
401 YCKASLSASS SLEPTPPEDT AISSLRPPSA PEMLTQHGAQ EQVEDHPGHS 
451 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 
501 HAAPGLECSS GSANCQGAGP SADGI SSRLT PAESCMGLVR MNLYTHCVKG 
551 LMLSLLAEEP LLGDSAAIEE VYHSSLASLN GLEVHLKETL PRDEAASTSS 
601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLMHSEF AQLPALYEMT 
651 VRNASTAVYA CCNPIQETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20c21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c21, frame 3 



Report for DKFZphtes3_20c21 . 3 
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TGCAGATTGG 
GACCCAGAGC 
GGGACGGGCT 
TCTCCTGCTG 
GGCATTTTTT 
AACCCCCACA 
GGATCTGTGA 
AGTGCGCTCT 
ACACGGAGAC 
GGCGTCTTCT 
GAGTTTGGGC 
GAGAATTCAG 
GGCCGTGCAG 
AAGATGTCTG 
AGGGAGCGAC 
CTCAGTTCTA 
ACTGCAGGAT 
GTACTTGAGA 
GAACTGATAT 
TCCAAGGGCT 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAA 



WO 01/12659 



[LENGTH] 708 

[MW] 76900.23 

[pi] 5.30 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.36 % 

SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTE^AGICYFYPSQTLLDQQELLCGQIAGV 

SEG .xxxxxxxxxxxx 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCVSDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeeeccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIEQILKNTSDLHKIFNSLWNLDQTKVEPLLLLKAARIL 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

SEQ PNVQI I PVFVTKEEAI SLHEFPVEQMTRSLASPAGLQDGSAQHHPKGGSTSALKENATGH 

SEG 

PRD ccceeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGRKENGCLSGHDLESIRPAGLHNSARGEVLGLSSSLGKEL 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccchhh 

SEQ VFLQEELDLSEIHI PEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AISSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPIPRADPLPRRTRRPLLLPRLDPGQRG 

SEG xxxxxxxxxxxxxxxxxxxxx. . . . 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLGDSAAIEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

SEQ TYNFTYYDRIQ S LLMANL PQ V AT P H DRR FLQ A VS LMH SE FAQL P AL Y EMT VRN ASTA V Y A 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhccceeeee 

SEQ CCNPIQETYFQQLAPAARSSGFPNPQDGAFSLSGKAKQKLLKHGVNLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc 

(No Prosite data available for DKFZphtes3_20c21 . 3) 
(No Pfam data available for DKFZphtes3_20c21 . 3) 
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DKFZphtes3_20k2 



group: signal transduction 

DKFZphtes3_20k2 encodes a novel 839 amino acid protein with strong similarity to rat vanilloid 
receptor subtype 1. 

VRl seems to play an important role in the activation and sensitization of nociceptors. It is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VRl. 

The new protein can find application as a target for the development of new nociception- 
modulating drugs . 



strong similarity to rat vanilloid receptor subtype 1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4187 bp 

Poly A stretch at pos. 4154, polyadenylation signal at pos. 4135 



1 GGCTCAGGCA GGCCTGGCCC AGAGTCACGC TGGCAACCAC GAGTTTGGGA 
51 AGCAGTCGTA TTCTCTCTCT CTCTCTCTCT CTCTCAGTAT CCATGACAGT 
101 GTGATGGAGA GTCTCTGCCG TGCCATCTGG GATGCAAACC GTCCCTGTGT 
151 CCCCCACGTC CAGGCCGTAG ATGCTCCCCG CCGGTCAGTC ACTTAGTCGT 
201 CAGATCGCCC GTCCTGGTAT CACAGTGCTT CTGTTCAGGT TGCACACTGG 
251 GCCACAGAGG ATCCAGCAAG GATGAAGAAA TGGAGCAGCA CAGACTTGGG 
301 GGCAGCTGCG GACCCACTCC AAAAGGACAC CTGCCCAGAC CCCCTGGATG 
351 GAGACCCTAA CTCCAGGCCA CCTCCAGCCA AGCCCCAGCT CTCCACGGCC 
401 AAGAGCCGCA CCCGGCTCTT TGGGAAGGGT GACTCGGAGG AGGCTTTCCC 
451 GGTGGATTGC CCTCACGAGG AAGGTGAGCT GGACTCCTGC CCGACCATCA 
501 CAGTCAGCCC TGTTATCACC ATCCAGAGGC CAGGAGACGG CCCCACCGGT 
551 GCCAGGCTGC TGTCCCAGGA CTCTGTCGCC GCCAGCACCG AGAAGACCCT 
601 CAGGCTCTAT GATCGCAGGA GTATCTTTGA AGCCGTTGCT CAGAATAACT 
651 GCCAGGATCT GGAGAGCCTG CTGCTCTTCC TGCAGAAGAG CAAGAAGCAC 
701 CTCACAGACA ACGAGTTCAA AGACCCTGAG ACAGGGAAGA CCTGTCTGCT 
751 GAAAGCCATG CTCAACCTGC ATGACGGACA GAACACCACC ATCCCCCTGC 
801 TCCTGGAGAT CGCGCGGCAA ACGGACAGCC TGAAGGAGCT TGTCAACGCC 
851 AGCTACACGG ACAGCTACTA CAAGGGCCAG ACAGCACTGC ACATCGCCAT 
901 CGAGAGACGC AACATGGCCC TGGTGACCCT CCTGGTGGAG AACGGAGCAG 
951 ACGTCCAGGC TGCGGCCCAT GGGGACTTCT TTAAGAAAAC CAAAGGGCGG 
1001 CCTGGATTCT ACTTCGGTGA ACTGCCCCTG TCCCTGGCCG CGTGCACCAA 
1051 CCAGCTGGGC ATCGTGAAGT TCCTGCTGCA GAACTCCTGG CAGACGGCCG 
1101 ACATCAGCGC CAGGGACTCG GTGGGCAACA CGGTGCTGCA CGCCCTGGTG 
1151 GAGGTGGCCG ACAACACGGC CGACAACACG AAGTTTGTGA CGAGCATGTA 
1201 CAATGAGATT CTGATCCTGG GGGCCAAACT GCACCCGACG CTGAAGCTGG 
1251 AGGAGCTCAC CAACAAGAAG GGAATGACGC CGCTGGCTCT GGCAGCTGGG 
1301 ACCGGGAAGA TCGGGGTCTT GGCCTATATT CTCCAGCGGG AGATCCAGGA 
1351 GCCCGAGTGC AGGCACCTGT CCAGGAAGTT CACCGAGTGG GCCTACGGGC 
1401 CCGTGCACTC CTCGCTGTAC GACCTGTCCT GCATCGACAC CTGCGAGAAG 
1451 AACTCGGTGC TGGAGGTGAT CGCCTACAGC AGCAGCGAGA CCCCTAATCG 
1501 CCACGACATG CTCTTGGTGG AGCCGCTGAA CCGACTCCTG CAGGACAAGT 
1551 GGGACAGATT CGTCAAGCGC ATCTTCTACT TCAACTTCCT GGTCTACTGC 
1601 CTGTACATGA TCATCTTCAC CATGGCTGCC TACTACAGGC CCGTGGATGG 
1651 CTTGCCTCCC TTTAAGATGG AAAAAATTGG AGACTATTTC CGAGTTACTG 
1701 GAGAGATCCT GTCTGTGTTA GGAGGAGTCT ACTTCTTTTT CCGAGGGATT 
1751 CAGTATTTCC TGCAGAGGCG GCCGTCGATG AAGACCCTGT TTGTGGACAG 
1801 CTACAGTGAG ATGCTTTTCT TTCTGCAGTC ACTGTTCATG CTGGCCACCG 
1851 TGGTGCTGTA CTTCAGCCAC CTCAAGGAGT ATGTGGCTTC CATGGTATTC 
1901 TCCCTGGCCT TGGGCTGGAC CAACATGCTC TACTACACCC GCGGTTTCCA 
1951 GCAGATGGGC ATCTATGCCG TC AT GAT AG A GAAGATGATC CTGAGAGACC 
2001 TGTGCCGTTT CATGTTTGTC TACATCGTCT TCTTGTTCGG GTTTTCCACA 
2051 GCGGTGGTGA CGCTGATTGA AGACGGGAAG AATGACTCCC TGCCGTCTGA 
2101 GTCCACGTCG CACAGGTGGC GGGGGCCTGC CTGCAGGCCC CCCGATAGCT 
2151 CCTACAACAG CCTGTACTCC ACCTGCCTGG AGCTGTTCAA GTTCACCATC 
2201 GGCATGGGCG ACCTGGAGTT CACTGAGAAC TATGACTTCA AGGCTGTCTT 
2251 CATCATCCTG CTGCTGGCCT ATGTAATTCT CACCTACATC CTCCTGCTCA 
2301 ACATGCTCAT CGCCCTCATG GGTGAGACTG TCAACAAGAT CGCACAGGAG 
2351 AGCAAGAACA TCTGGAAGCT GCAGAGAGCC ATCACCATCC TGGACACGGA 
2401 GAAGAGCTTC CTTAAGTGCA TGAGGAAGGC CTTCCGCTCA GGCAAGCTGC 
2451 TGCAGGTGGG GTACACACCT GATGGCAAGG ACGACTACCG GTGGTGCTTC 
2501 AGGGTGGACG AGGTGAACTG GACCACCTGG AACACCAACG TGGGCATCAT 
2551 CAACGAAGAC CCGGGCAACT GTGAGGGCGT CAAGCGCACC CTGAGCTTCT 
2601 CCCTGCGGTC AAGCAGAGTT TCAGGCAGAC ACTGGAAGAA CTTTGCCCTG 
2651 GTCCCCCTTT TAAGAGAGGC AAGTGCTCGA GATAGGCAGT CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 

2751 CTGAGGTCTT CAAGAGTCCT GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 

2801 GCAGACAGCA CTGTCAACAC TGGGCCTTAG GAGACCCCGT TGCCACGGGG 

2851 GGCTGCTGAG GGAACACCAG TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 

2901 CCTGCCCAGC ATGTTCCCAA ATCTGTGCTG GACAAGCTGT GGGAAGC GTT 

2951 CTTGGAAGCA TGGGGAGTGA TGTACATCCA ACCGTCACTG TCCCCAAGTG 

3001 AATCTCCTAA CAGACTTTCA GGTTTTTACT CACTTTACTA AACAGTTTGG 

3051 ATGGTCAGTC TCTACTGGGA CATGTTAGGC CCTTGTTTTC TTTGATTTTA 

3101 TTCTTTTTTT TGAGACAGAA TTTCACTCTT CTCACCCAGG CTGGAATGCA 

3151 GTGGCACAAT TTTGGCTCCC TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 

3201 TCTCCTGCCT CGGCTTCCCA AGTAGCTGGG ATTACAGGCA CGTGCCACCA 

3251 TGTCTGGCTA ATTTTTTGTA TTTTTTTAAT AGATATGGGG TTTCGCCATG 

3301 TTGGCCAGGC TGGTCTCGAA CTCCTGACCT CAGGTGATCC GCCCACCTCG 

3351 GCCTCCCAAA GTGCTGGGAT TACAGGTGTG AGCCTCCACA CCTGGCTGTT 

3401 TTCTTTGATT TTATTCTTTT TTTTTTTTCT GTGAGACAGA GTTTCACTCT 

3451 TGTTGCCCAG GCTGGAGTGC AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 

3501 TGCCTCCCGG GTTCAAGCGA TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 

3551 GATTACAGGT GAGCACTACC ACGCCCGGCT AATTTTTGTA TTTTTAATAG 

3601 AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCGAACT CTTGACCTCA 

3651 GGTGATCTGC CCGCCTTGGC CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 

3701 CCGCTGCGCT CGGCCTTCTT TGATTTTATA TTATTAGGAG CAAAAGTAAA 

3751 TGAAGCCCAG GAAAACACCT TTGGGAACAA ACTCTTCCTT TGATGGAAAA 

3801 TGCAGAGGCC CTTCCTCTCT GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 

3351 GTGGTTTGGG GGTGTTGGTG TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 

3901 TCCCACTCCC AGCTCTGGCA GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 

3951 ATCCTTCCTT ACGATCAATC ACAGTCTCCA GAAGATCAGC TCAATTGCTG 

4001 TGCAGGTTAA AACTACAGAA CCACATCCCA AAGGTACCTG GTAAGAATGT 

4051 TTGAAAGATC TTCCATTTCT AGGAACCCCA GTCCTGCTTC TCCGCAATGG 

4101 CACATGCTTC CACTCCATCC ATACTGGCAT CCTCAAATAA ACAGATATGT 

4151 ATACATATAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880: 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 



Peptide information for frame 2 



ORF from 272 bp to 2788 bp; peptide length: 839 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 

1 MKKWSSTDLG AAADPLQKDT CPDPLDGDPN SRPPPAKPQL STAKSRTRLF 

51 GKGDSEEAFP VDCPHEEGEL DSCPTITVSP VITIQRPGDG PTGARLLSQD 

101 SVAASTEKTL RLYDRRSIFE AVAQNNCQDL ESLLLFLQKS KKHLTDNEFK 

151 DPETGKTCLL KAMLNLHDGQ NTTIPLLLEI ARQTDSLKEL VNASYTDSYY 

201 KGQTALHIAI ERRNMALVTL LVENGADVQA AAHGDFFKKT KGRPGFYFGE 

251 LPLSLAACTN QLGIVKFLLQ NSWQTADISA RDSVGNTVLH ALVEVADNTA 

301 DNTKFVTSMY NEILILGAKL HPTLKLEELT NKKGMTPLAL AAGTGKIGVL 

351 AYILQREIQE PECRHLSRKF TEWAYGPVHS SLYDLSCIDT CEKNSVLEVI 

401 AYSSSETPNR HDMLLVEPLN RLLQDKWDRF VKRIFYFNFL VYCLYMIIFT 

451 MAAYYRPVDG LPPFKMEKIG DYFRVTGEIL SVLGGVYFFF RGIQYFLQRR 

501 PSMKTLFVDS YSEMLFFLQS LFMLATWLY FSHLKEYVAS MVFSLALGWT 

551 NMLYYTRGFQ QMGIYAVMIE KMILRDLCRF MFVYIVFLFG FSTAVVTLIE 

601 DGKNDSLPSE STSHRWRGPA CRPPDSSYNS LYSTCLELFK FTIGMGDLEF 

651 TENYDFKAVF IILLLAYVIL TYILLLNMLI ALMGETVNKI AQESKNIWKL 

701 QRAITILDTE KSFLKCMRKA FRSGKLLQVG YTPDGKDDYR WCFRVDEVNW 

751 TTWNTNVGI I NEDPGNCEGV KRTLSFSLRS SRVSGRHWKN FALVPLLREA 

801 SARDRQSAQP EEVYLRQFSG SLKPEDAEVF KSPAASGEK 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20k2, frame 2 

TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds., N = 1, 
Score - 3760, P = 0 

TREMBLNEW:AB015231_1 product: "stretch-inhibitable nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), complete cds., N - 2, Score = 2090, P = 2e-219 



>TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length = 838 

HSPs: 

Score - 3760 (564.1 bits), Expect = 0.0e+00, P - 0.0e+00 
Identities = 721/839 (85%), Positives = 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+++C DP D DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASP 60 

Query: 61 VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 120 

+DCP+EEG L SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRSIF+ 
Sbjct: 61 LDCPYEEGGLASCPIITVSSVLTIQRPGDGPASVRPSSQDSVSAG-EKPPRLYDRRSIFD 119 

Query: 121 AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LTD+EFKDPETGKTCLLKAMLNLH+GQN TI LLL++ 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDV 179 

Query: 181 ARQTDSLKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDSYYKGQTALHIAIERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKTDSLKQFVNASYTDSYYKGQTALHIAIERRNMTLVTLLVENGADVQAAANGDFFKKT 239 

Query: 241 KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 300 

KGRPGFYFGELPLSLAACTNQL IVKFLLQNSWQ ADISARDSVGNTVLHALVEVADNT 
Sbjct: 240 KGRPGFYFGELPLSLAACTNQLAIVKFLLQNSWQPADI SARDS VGNTVLHALVEVADNTV 299 

Query: 301 DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYILQREI E 
Sbjct: 300 DNTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRIFYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEK-IGDYFRVTGEI 479 

RLLQDKWDRFVKRI FY FNF VYCLYMIIFT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRI FY FN FFVYCLYMI I FT AAAYYRPVEGLPPYKLKNTVGDYFRVTGE I 479 

Query: 480 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATWLYFSHLKEYVA 539 

LSV GGVYFFFRGIQYFLQRRPS+K+LFVDSYSE+LFF+QSLFML +VVLYFS KEYVA 
Sbjct: 480 LSVSGGVYFFFRGIQYFLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVA 539 

Query: 540 SMVFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAVVTLI 599 

SMVFSLA+GWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVY+VFLFGFSTAVVTLI 
Sbjct: 540 SMVFSLAMGWTNMLYYTRGFQQMGI YAVMIEKMILRDLCRFMFVYLVFLFGFSTAVVTLI 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H+ RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACKP-GNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 658 

Query: 660 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 
Sbjct: 659 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 718 

Query: 720 AFRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 779 

AFRSGKLLQVG+TPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 
Sbjct: 719 AFRSGKLLQVGFTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 778 

Query: 780 SSRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 779 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 
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Report for DKFZphtes3_20k2 .2 



[LENGTH] 839 

[MW] 94950.75 

EpIJ 6.90 

[HOMOLJ TREMBL:AF029310_1 product: "vanilloid receptor subtype l w ; Rattus norvegicus 
vanilloid receptor subtype 1 mRNA, complete cds. 0.0 

[FUNCAT] 99 unclassified proteins IS. cerevisiae, YILll2w] 4e-05 

[PIRKW] alternative splicing 3e-06 

[PIRKW] peripheral membrane protein 3e-06 

[SUPFAM] ankyrin repeat homology 3e-06 

[SUPFAM] unassigned ankyrin repeat proteins 3e-06 

[PFAMJ Ank repeat 

[KW] TRANSMEMBRANE 4 



SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 

PRD cccccccccccccccccceeeeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 

PRD hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDSLKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 

PRD hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

SEQ DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 

SEQ RLLQDKWDRFVKRI FYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEKIGDYFRVTGEIL 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM 

SEQ SVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MV FS L ALGWTNML Y YT RG FQQMG I Y A VM I EKMI LRDL CRFMFV Y I V FL FG FS T A V VT LIE 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMMMMMMMMMM. 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAVF 

PRD cccccccccccccccccccccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ IILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGI INEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 



(No Prosite data available for DKFZphtes3_20k2 . 2) 



Pfam for DKFZphtes3_20k2 . 2 
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HMM_NAME Ank repeat 

HMM * Gy T PLH I AARyNN vEMV r 1 LLQHG ADI N * 

G+T+LHIA +++N+ +V LL+++GAD+ 
Query 202 GQTALHI AI ERRNMALVTLLVENGADVQ 229 
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DKFZphtes3_2013 



group: transmembrane protein 

DKFZphtes3_2013 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor. 

The novel protein contains one transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



similarity to IL-17 receptor 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 2406 bp 

Poly A stretch at pos. 2345, no polyadenylation signal found 



1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 
51 AGCCAGCAGA AACAGTGGGC TGTACAACAT CACCTTCAAA TATGACAATT 
101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 
151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 
201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 
251 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 
301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 
351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 
401 TTGTCCCTTT TCCTTCCATT AAAAACGAAA GCAATTACCA CCCTTTCTTC 
451 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 
501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 
551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 
601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 
651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 
701 TTTCTCCAGG GG AT TAT AT A ATTGAGCTGG TGGATGACAC TAACACAACA 
751 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 
801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 
851 TCGCGACGCT CTTCACTGTG ATGTGCCGCA AGAAGCAACA AGAAAATATA 
901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 
951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 
1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 
1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 
1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 
1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 
1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 
1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 
1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 
1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 
1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 
1451 CCCGAGACCA CGGCCTCCAG GAGCCGGGGC AGCACACGCG ACAGGGCAGC 
1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 
1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 
1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 
1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 
1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 
1751 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 
1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 
1851 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 
1901 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 
1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 
2001 GTCCTCCTCT TCAGGCCTGG GTGAGGAGGA ACCTCCTGCC CTTCCTTCCA 
2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 
2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACAAAACG AAAGAGTCTA 
2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 
2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 
2251 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 
2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2401 AAAAAA 



BLAST Results 



No BLAST result 
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No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 34 6 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 



1 KESQPFLNMK 
51 ACKPFWKPRN 
101 KTCKQEQTTE 
151 AGPIRAVAIT 
201 AALPRERLRP 
251 EDFSLCREGQ 
301 GKGELFLVAV 
351 STKYRLMDNL 
401 AICNMHQFID 
451 PGPESDFCLK 
501 PLLHTVKAGS 
551 SVSSSSGLGE 



FETDYFVKW 
LNISQHGSDM 
MTSCLLQNVS 
VPLVVISAFA 
RPKVFLCYSS 
REWVIQKIHE 
SAIAEKLRQA 
PQLCSHLHSR 
EEPDWFEKQF 
VEAAVLGATG 
PSDMPRDSGI 
EEPPALPSKL 



PFPSIKNESN 
QVSFDHAPHN 
PGDYIIELVD 
TLFTVMCRKK 
KDGQNHMNVV 
SQFIIVVCSK 
KQSSSAALSK 
DHGLQEPGQH 
VPFHPPPLRY 
PADSQHESQH 
YDSSVPSSEL 
LSSGSCKADL 



YHPFFFRTRA 
FGFRFFYLHY 
DTNTTRKVMH 
QQENIYSHLD 
QCFAYFLQDF 
GMKYFVDKKN 
FIAVYFDYSC 
TRQGSRRNYF 
REPVLEKFDS 
GGLDQDGEAR 
SLPLMEGLST 
GCRSYTDELH 



CDLLLQPDNL 
KLKHEGPFKR 
YALKPVHSPW 
EESSESSTYT 
CGCEVALDLW 
YKHKGGGRGS 
EGDVPGILDL 
RSKSGRSLYV 
GLVLNDVMCK 
PALDGSAALQ 
DQTETSSLTE 
AVAPL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2013, frame 1 

TREMBL:U58917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds., N - 1, Score » 215 r P = 4.7e-14 

TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus 
interleukin 17 receptor mRNA, complete cds., N = 2, Score = 152, P « 
l.le-13 



"IL-17 receptor"; Homo sapiens IL-17 receptor 



>TREMBL:U58917_1 product: 
mRNA, complete cds. 

Length « 866 

HSPs: 

Score « 215 (32.3 bits), Expect = 4.7e-14, P « 4.7e-14 
Identities = 85/284 (29%), Positives = 131/284 (46%) 



KV++ YS+ D +++VV FA FL CG EVALDL E+ ++ 



Query: 


213 


Sbjct: 


379 


Query: 


269 


Sbjct: 


438 


Query: 


325 


Sbjct: 


498 


Query: 


384 


Sbjct: 


551 


Query: 


435 


Sbjct: 


611 



WV QK 



IIV+CS+G + 



+LF A++ I 



++ YF + SC+GDVP + + +Y LMD ++ + +D + +PG+ R 



G S NY RS GR L A+ 



PDWFE + + 



P L 



+ EP+ 



+G+V 



++ PS CL++ VG G A 



G+ 



Pedant information for DKFZphtes3_2013 # frame 1 



Report for DKFZphtes3_2013.1 
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[LENGTH] 595 

[MW] 66847.05 

tpl) 6.27 

[HOMOLJ TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus interleukin 

17 receptor mRNA, complete cds. 2e-14 
[BLOCKS] BL00740A MAM domain proteins 

[BLOCKS] BL01224B N-acetyl-gamma-glutamyl-phosphate reductase proteins 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 13.61 % 

SEQ MESQPFLNMKFETDYFVKVVPFPSIKNESNYHPFFFRTRACDLLLQPDNLACKPFWKPRN 

SEG 

PRD ccccccccccccccceeeeeccccccccccceeeeeeceeeeeeeccccccccccccccc 

MEM 

SEQ LNISQHGSDMQVSFDHAPHNFGFRFFYLHYKLKHEGPFKRKTCKQEQTTEMTSCLLQNVS 

SEG 

PRD eeeecccccceeeecccccccceeeeeehhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ PGDYIIELVDDTNTTRKVMHYALKPVHSPWAGPIRAVAITVPLVVISAFATLFTVMCRKK 

SEG 

PRD ccceeeeeeccccccccccccccccccccccccceeeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ QQENIYSHLDEESSESSTYTAALPRERLRPRPKVFLCYSSKDGQNHMNVVQCFAYFLQDF 

SEG xxxxxxx xxxxxxxxxx 

PRD hhhhhhhhhcccccccceeeeccccccccccceeeeeeecccccchhhhhhhhhhhhhhc 

MEM 

SEQ CGC E V A L DLWE D F S LCREGQREWV IQKIHESQFIIWCS KGMKY FVDKKN Y KH KGGG RG S 

SEG xxxxxxxxx 

PRD ccchhhhhhhhccccccccchhhhhhhhhhheeeeeeeeccceeeeeccccccccccccc 

MEM 

SEQ GKGELFLVAVSAIAEKLRQAKQSSSAALSKFIAVYFDYSCEGDVPGILDLSTKYRLMDNL 

SEG XXX xxxxxxxxxxxxxxx 

PRD ccceeeeehhhhhhhhhhhhhhcchhhhhhhheeeeccccccccccccccchhhhhhccc 

MEM 

SEQ PQLCSHLHSRDHGLQEPGQHTRQGSRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQF 

SEG 

PRD cchhhhhhcccccccccccccccccceeeeccccccceeeeeeceeeecccccceeeeee 

MEM 

SEQ VPFHPPPLRYREPVLEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQH 

SEG 

PRD eecccccccccceeeeeccccceeeeecccccccccchhhhhhhhhhccccccccccccc 

MEM 

SEQ GGLDQDGEARPALDGSAALQPLLHTVKAGSPSDMPRDSGIYDSSVPSSELSLPLMEGLST 

SEG xxxxxxxxxxxxxxxxx . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccchh 

MEM 

SEQ DQTETSSLTESVSSSSGLGEEEPPALPSKLLSSGSCKADLGCRSYTDELHAVAPL 

SEG . .xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhheeecccccccccccccccceeeccccceeeccccccccceeeeccc 

MEM 

{No Prosite data available for DKFZphtes3_2013 . 1 ) 
{No Pfam data available for DKFZphtes3_2013 . 1) 
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DKFZphtes3_20ml8 



group: nucleic acid management 

DKFZphtes3_20ml8 encodes a novel 132 amino acid protein with similarity to the S. cerevisiae 
mitochondrial carrier protein RIM2. 

The novel protein contains a leucine zipper and a Prosite mitochondrial energy transfer 
proteins signature. It is member of a family of substrate carrier proteins which are found in 
the inner mitochondrial membrane and are involved in energy transfer. The RIM2/MRS12 gene 
encodes a predicted protein of 377 amino acids that is essential for mitochondrial DNA 
metabolism and proper cell growth. Inactivation of this gene causes the total loss of 
mitochondrial DNA and, compared to wild-type rhoo controls, a slow-growth phenotype on media 
containing glucose. The novel protein seems to be the human orthologue of this protein. 

The new protein can find application in modulation of mitochondrial DNA replication and 
maintenance. 



similarity to carrier protein RIM2 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 3572 bp 

Poly A stretch at pos. 3530, polyadenylation signal at pos. 3510 



1 GCCGCGGGGA GGGCTGTGCC GGTTGCTTTC TGCAGCCGCA TCTCGGCCAG 
51 CTCTCCTCGC CGTCCCCGGG GCGCTGTGCG TCTCCAGTCC GGGACCGAAG 
101 CCGCCTGCCG TAGCGGGCGG CCAGATCCGC GTCCCGCCTC AGCGGCCGGA 
151 GGACATGCGG GAGAGAGAAT GAGCCAGAGG GACACGCTGG TGCATCTGTT 
201 TGCCGGAGGA TGTGGTGGTA CAGTGGGAGC TATTCTGACA TGTCCACTGG 
251 AAGTTGTAAA AACACGACTG CAGTCATCTT CTGTGACGCT TTATATTTCT 
301 GAAGTTCAGC TGAACACCAT GGCTGGAGCC AGTGTCAACC GAGTAGTGTC 
351 TCCCGGACCT CTTCATTGCC TAAAGGTGAT CTTGGAAAAA GAAGGGCCTC 
401 GTTCCTTGTT TAGAGGACTA GGCCCCAATT TAGTGGGGGT AGCCCCTTCC 
451 AGAGCAATAT ACTTTGCTGC TTATTCAAAC TGCAAGGAAA AGTTGAATGA 
501 TGTATTTGAT CCTGATTCTA CCCAAGTACA TATGATTTCA GCTGCAATGG 
551 CAGGTATGAA TGTATAATAT TAAAAAAAAA AAAAACTTTC TGAAACCTAG 
601 AGGCTTAATA TTGAATTATA AGTTTGTAGT GAAAAGTTGA TGATTAATGT 
651 GCTTTTCATT GATTAGATGA TTTTTACGTT TATCGATATA AACCAAATTA 
701 GGTATATGTA AAATCTGTCA TCAGTTGACA TTTTTGTAGT CAGGAGTTTA 
751 CATGCTAGGG TACAAGTAAT ATATTTATAT TGCCTTGTGT AGTCCACTGA 
801 ATGTTTAGTG ATCATTGTTA ACAGTTTTAA GAATCCAACC ATAATTACAC 
851 TATAAATAAG TTATGGAGCT GTAATTTACT CTTCTCTCCT CAATTTCTGT 
901 TAGTGCCTTT TCCCTTTTTG CTGCATGTTT TGGCTTCTGT CTGAAATGTG 
951 TCGGCAATTC TTGGTAAAGT ATTCATTTTG TCCTGTGCTC AAATGCTGAA 
1001 ATTTTTGTGA GTGATGTATT ATTATTGACA ATTCAGTTAC TATGTGTATT 
1051 TTTTAAAATT GTTTATTATT CTACATAATT CACACTAGAC AGCACCTGAA 
1101 ATT TAG AC AC TGGCTATGTG TACATGCTTA CTATAGAAAT GTTTCCAGGA 
1151 ACTCTCTGTT TCTGTCATCA CTGATAAGTA TATATGATTC TGAATTAAAA 
1201 TAACTAGTTT TAGGTCTTTA CCCTGCCATA AAGATAAACA GTTGGTTTGA 
1251 CCAATCTGGT TCTGGAATCA TTTGCTGCTA TGCATGTTAG ACAAAGCCAC 
1301 GAACTTTGAT TTTCCATTGA AAATTCTCCC TAATATCTGA GATTTATTGT 
1351 ATATTTACTC ATATCTCACA TTTTCAAATT ATGCTGTAAC TTTATAAACT 
1401 GTAGCTGCTT TCATCAGCTA TTGATCAATA AATTGAATGT CAATTATGTG 
1451 CTTAATAATG AGTGCCTTAA ACTGTTAAAC ACTTTTGGTT TAGAAATAAA 
1501 GTGAATCAAT TTGACCTATA TACTTCATGA AGTAAGTAAG TTTGAAATAC 
1551 AAATTTCTGA AAGGTCAATA GCCCTTATCG TATTACAAAT TGTTTTTAAG 
1601 GCTTTTTGTA TTTATTAATT GTCAGTTGAT TCACTGAAGC TTTAAAACTG 
1651 GAAGGGACAA TCCAAAGGTC AAAAGAGTGA AATACAATCA TTTACCAATA 
1701 AGGAAACCTT GGGCAAATTA TGTAATTTAT GTGAACCTCT CTTAGCTTAC 
1751 CCATGGAATG AGTCAAGTGG TC TAG AT AG A TTTGGATTTT GAGAATTAGT 
1801 TCTTTCATTT AGTGTTATAG AGATTATCTT GTTACAACTA GAATTATTTT 
1851 TAATGTAATT TTTACAGATG TTGAATATTA GTAGATAGGA TTTTTCCCCT 
1901 ACGAATTTGG ATGTAAGGTA AAGGTTGGTG GCCAGTGACA AACCTTATAA 
1951 CCACTTTATC AGGTTCTTTA AAAATATATT TGTGAATTAC CAGTGATTAT 
2001 GTTTTTGGCT TATAACCTCA GATAATTATA AAGAAATGTT AATCTTATTT 
2051 GAAAGAATTG GAATCTAGAA AGTTAGATGA GCAGTCATTT TAT ATT GAT A 
2101 TTTGTTATAT CAGTATAGCA AATGCAGAGG TTCAGAATAT CTTTATTTCC 
2151 ACTGGAACAT CTTATTTCAT TAGAGTATCT CATCAGAATT TATTACTGTA 
2201 TTTGTATCAC ATTGCAAAGA ATTTCAGTAG AATTGTCAGT TTGCACTTTT 
2251 TTCTCAAATG TGTACAAATG TTAACATATA GTTCATTTTT ATCTGTACAT 
2301 TGATGCCATT TCCCAACTTG AATTCCTCAA GTTTTGGTAA ACTTACAATC 
2351 TCATACTTGT TCAGAGGTTA TTGCACTGTA CACTTACTGT GTAGAAAATA 
2401 CTGTTTGAAT TTGTTTGCAG TTACATTGTT CTGAGAACTG TGCTCTCAGA 
2451 GCTTCTGTGC ACTATTCATG AGCATTAACA CTTAGCCTTG CAGTTTTATA 
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2501 CATAACTATA TGGTTAGTAA AACTGAATGG 

2551 GTAGGCTTTT GCCCCCTTTG TTCTTGAAAT 

2601 GGGTTTTTTT TAGGATTATT TTTATAGGTC 

2651 GTATGAAGTA CTTAAAGATA GTTCTGTGAA 

2701 TTCAAGGGAA AAAATGCTAA CCTTGTCACT 

2751 AAAATAAACC ATTAATGATA CTGCCTGCAA 

2801 CACACACATT AAGGATTTAT AAGGCACTGT 

2851 GACCTCTCAA TTCATTTTCA TTTTGCATTT 

2901 TAATTTAGAT AATAAAAATT TATTTTATTA 

2951 TGGGTCTTTT TATTTGTTGT AGTGCATACT 

3001 AAAGTTGAGC TATAAATTTT CATGCATTAA 

3051 GATATTTAAT CAGATTAAAT AATGTTGACT 

3101 TTTTTTCTCC TACACATGAC CTTTGACAGA 

3151 GAGGGTATCT GTTTTGTTGC CTGTATATTT 

3201 TTCCTTTGTA TACACCTAGG CACAGATGTA 

3251 TTACTTCTTT CTTTATACTA ATTCTCAATT 

3301 ATGTATATAC TTTTATATAG AACATTATAA 

3351 AATTTTAATT GGATTATGTA TTCATACAGT 

3401 CTAATAATGT AATCATTGAA TGTTTCCTAC 

3451 GCTCACAGCA TACAGTTATT TTTCAATTTA 

3501 ATTTCATTAT AATAAAGGCT TTTACTCATT 

3551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
95198680: 

Overexpression of a novel member of the mitochondrial carrier family rescues defects in both 
DNA and RNA metabolism in yeast mitochondria. 



Peptide information for frame 1 



ORF from 169 bp to 564 bp; peptide length: 132 
Category: similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: LEUCINE_ZIPPER (27-49) 
MITOCH CARRIER (26-36) 



1 MSQRDTLVHL FAGGCGGTVG AILTCPLEVV KTRLQSSSVT LYISEVQLNT 
51 MAGASVNRVV SPGPLHCLKV ILEKEGPRSL FRGLGPNLVG VAPSRAIYFA 
101 AYSNCKEKLN DVFDPDSTQV HMISAAMAGM NV 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20ml8, frame 1 

PIR:S44092 probable carrier protein c2 - Caenorhabditis elegans, N = 2, 
Score = 147, P = 1.5e-19 

PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 230, P - 6.2e-19 



>PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) 
Length - 377 

HSPs: 

Score = 230 (34.5 bits), Expect = 6.2e-19, P » 6.2e-19 
Identities = 55/133 (41%), Positives = 80/133 (60%) 

Query: 8 VHLFAGGCGGTVGAILTCPLEVVKTRLQSSS-VTLYISEVQLNTMAGA SVNRWSP 62 

VH AGG GG GA++TCP ++VKTRLQS + Y S+ +N G+ S+N V+ 
Sbjct: 54 VHFVAGGIGGMAGAWTCPFDLVKTRLQSDIFLKAYKSQA-VNISKGSTRPKSINYVIQA 112 



TCCAATGCAG 
AATCTAGACC 
TAAATATGAA 
AAATCATTTT 
TT ACT AC AC A 
GATTTTAACA 
ACGTAATTTT 
TATCCATATG 
AAAGGACAGT 
ATAAGAATTT 
AAATTTGTTT 
CTTAATATTT 
CTAAGTATAT 
TGTTTAAATT 
TGCAAAAAAA 
TTTAAAAGAT 
ATGTAAAGGA 
TATTCTCAAT 
ATACGTAGTG 
TGTTTTTCTA 
AAATACAAAA 



ACTCATTAAA 
AGATTACTCG 
TGATTTGGGG 
CAGCTGTCTA 
AAACCACACT 
C AC C AG AT AG 
TATTCCAAGT 
AACTCATGTT 
TTATTTAAAG 
GTAAGCCTCT 
CAGTTGTGAG 
TGCCTGCCTT 
CTCAGCTATT 
AACTTGTATA 
ATTTGTTAAA 
TTTATCTGGC 
AATGAATTCT 
TTTTAAAATA 
GGTTTTATTT 
TTAGACTTAA 
AAAAAAAAAA 
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Query: 63 GP LHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLNDVFD— P 115 

G L + + ++EG RSLF+GLGPNLVGV P+R+I FY K+ F+ 

Sbjct: 113 GTHFKETLGIIGNVYKQEGFRSLFKGLGPNLVGVIPARSINFFTYGTTKDMYAKAFNNGQ 172 

Query: 116 DSTQVHMI SAAMAG 129 

++ +H+++AA AG 
SbjCt: 173 ETPMIHLMAAATAG 186 

Score - 77 {11.6 bits), Expect « l.le+00, P = 6.8e-01 
Identities = 25/88 (28%), Positives - 39/88 (44%) 

Query: 3 QRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSP 62 

Q ++HL A G A T P+ ++KTR VQL+ SV + + 

Sbjct: 172 QET PM I H LMAAATAG WAT AT ATN PI WL I KTR VQLDK AG KT S VRQY KN S 219 

Query: 63 G PLH CLK VILE KEG PRSLFRGLGPNLVG 90 

CLK ++ EG L++GL + +G 
SbjCt: 220 WD— CLKSVIRNEGFTGLYKGLSASYLG 245 

Score = 71 (10.7 bits), Expect - 6.6e+00, P « 1.0e+00 
Identities = 28/91 (30%), Positives « 45/91 (49%) 

Query: 12 AGGCGGTVGAILTCPLEWKTRLQSSSVTLYISEVQLNTMAGASVNRVVSPGPLHCLKVI 71 

+ G V +1 T P EW+TRL+ + +NG R+G+KVI 
SbjCt: 294 SAG LAK FV AS I AT Y PH E VVRT RLRQT P KEN G KRKYT-GLVQSFKVI 338 

Query: 72 LEKEGPRSL FRGLG PN L VG V A PS RA I Y FAA Y 102 

+++EG S++ GL P+L+ P+ I F + 
Sbjct: 339 IKEEGLFSMYSGLTPHLMRTVPNSIIMFGTW 369 

Pedant information for DKFZphtes3_20ml8, frame 1 



Report for DKFZphtes3_20ml8 . 1 

[LENGTH] 132 

[MWJ 13993.36 

[pi] 8.42 

[HOMOL] PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast (Saccharomyces 
cerevisiae) 7e-19 

[FUNCAT] 07,16 purine and pyrimidine transporters [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 08,04 mitochondrial transport [S. cerevisiae, YBR192wJ 3e-20 

[FUNCAT] 30.16 mitochondrial organization [S. cerevisiae, YBRl92w] 3e-20 

[FUNCAT] 02.13 respiration [S. cerevisiae, YBRl92w] 3e-20 

[FUNCAT] 01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.07 sugar and carbohydrate transporters (S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.99 other transport facilitators [S. cerevisiae, YEL006w] le-09 

[FUNCAT] 01.07.10 transport of vitamins, cof actors, and prosthetic groups [S. 
cerevisiae, YIL006w] 3e-09 

[FUNCAT] 07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 

2e-08 

[FUNCAT] 01.03.19 nucleotide transport [S. cerevisiae, YPROllc) 3e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 4e-08 

(FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 

2e-07 

[FUNCAT] 01.01.07 amino-acid transport [S. cerevisiae, YOR130C] 5e-05 

[FUNCAT] 07.10 amino-acid transporters [S. cerevisiae, YOR130c] 5e-05 

[FUNCAT] 01.04.07 phosphate transport [S. cerevisiae, YJR077c] 7e-05 

[FUNCAT] 13.04 homeostasis of other ions [S. cerevisiae, YJR077c] 7e-05 

[BLOCKS] BL00215B Mitochondrial energy transfer proteins 

[BLOCKS] BL00215A Mitochondrial energy transfer proteins 

[PIRKW] duplication 6e-09 

[PIRKW] transmembrane protein 6e-09 

[PIRKW] mitochondrial inner membrane 4e-07 

[PIRKW] transport protein 5e-06 

( PIRKW] mitochondrion 7e-08 

[PIRKW] chloroplast 3e-08 

(SUPFAM] Btl protein 3e-08 

[SUPFAM] ADP,ATP carrier protein repeat homology 4e-09 

[SUPFAM] Caenorhabditis probable carrier protein c2 4e-09 

[SUPFAM] probable carrier protein YPR02lc 6e-09 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MITOCH_CARRIER 1 

[PFAM] Mitochondrial carrier proteins 

[KW] Alpha_Beta 

SEQ MSQRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRW 
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PRD cccccceeeecccccccceeeeeecchhhhhhhhhhhccccccccccccccccccccccc 

SEQ SPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLNDVFDPDSTQV 

PRD cccchhhhhhhhhhcccceeeeccccceeeecccceeeeeehhhhhhhhhcccccccccc 

SEQ HMISAAMAGMNV 

PRD chhhhhhhcccc 



Prosite for DKFZphtes3_20ml8 . 1 

PS00029 27->49 LEUCINE_ZIPPER PDOC00029 

PS00215 26->36 MITOCH CARRIER PDOC00189 



Pfam for DKFZphtes3_20ml8 . 1 



HMM_NAME Mitochondrial carrier proteins 

HMM * p Fw kd FLAGG I AGmMe HT vM FP I D 1 1 KTRMQ1 QgEMpM . .ahpR 

+++++++AGG +G + +++++P++++KTR+Q++ ++ + ++ 
Query 5 DTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSS-SVTLYISEVQLNTMA 52 

HMM YkGMIdCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFY 

G+++C++ I+++EG+R+L+RGLG+N+++++P +AI + F+ Y 
Query 53 GASVNRVVSPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAY 102 

HMM EFMKeMFiDyfgeddnyWmWFwmnYMaGs* 

+KE ++D F++ D++++++ + +MAG+ 
Query 103 SNCKEKLNDVFDP-DSTQVHMISAAMAGM 130 
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group; signal transduction 

DKE*Zphtes3_21d4 encodes a novel 464 amino acid putative GTP exchanging factor related to RCC1. 

RCC1 (regulator of chromosome condensation) is a eukaryotic protein which binds to chromatin 
and interacts with ran, a nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP 
with GTP, acting as a guanine-nucleotide dissociation stimulator. 

The new protein can find application in the regulation of gene expression by activition of 
nuclear GTP-binding proteins. The X-linked retinitis pigmentosa is a result of a defect GTPase 
regulator, which contains a RCCl-type repeat. 



similarity to RCCl-like G exchanging factor RLG 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map->"20" 
Insert length: 2321 bp 

Poly A stretch at pos. 2293, polyadenylation signal at pos. 2262 



1 GGGTCACGCA AGATGGCGGC GCCCAGAGGC TGCTGAGGCG CGGAACGGAG 
51 GATGGCGCTG GTGGCGTTGG TGGCTGGGGC TCGGCTGGGG CGGCGGCTGA 
101 GCGGGCCGGG GCTGGGGCGA GGGCACTGGA CGGCGGCCAG GCGCTCCCGG 
151 AGCCGGCGCG AAGCGGCAGA AGCCGAGGCG GAGGTGCCCG TGGTCCAGTA 
201 CGTGGGCGAG CGCGCTGCCC GCGCCGATCG CGTCTTCGTG TGGGGCTTCA 
251 GCTTCTCGGG GGCGCTGGGC GTGCCTTCCT TTGTGGTGCC CAGCTCCGGG 
301 CCCGGGCCCC GCGCCGGCGC CCGACCGCGC CGCAGGATCC AGCCCGTGCC 
351 CTATCGCCTG GAGCTGGACC AAAAGATTTC ATCTGCTGCT TGCGGCTATG 
401 GATTCACACT GCTGTCCTCT AAGACTGCGG ATGTTACGAA AGTCTGGGGG 
451 ATGGGACTCA ACAAAGATTC TCAGCTTGGA TTTCACAGGA GCCGGAAAGA 
501 TAAAACGAGG GGCTACGAGT ATGTGTTGGA GCCCTCACCC GTCTCCCTGC 
551 CTCTGGACAG ACCTCAGGAG ACACGGGTGC TGCAGGTCTC CTGCGGCCGA 
601 GCTCACTCTC TTGTGTTGAC TGACAGGGAA GGAGTCTTCA GCATGGGAAA 
651 CAATTCTTAT GGGCAATGTG GAAGAAAGGT GGTCGAAAAT GAAATTTACA 
701 GTGAAAGTCA CAGAGTCCAC AGGATGCAGG ACTTCGATGG CCAGGTGGTC 
751 CAGGTCGCCT GTGGTCAGGA TCATAGTCTG TTCCTGACGG ATAAAGGAGA 
801 AGTCTATTCT TGTGGATGGG GTGCTGATGG GCAAACAGGT CTGGGTCACT 
851 ACAATATCAC CAGCTCGCCC ACCAAGCTGG GTGGAGACCT GGCGGGAGTG 
901 AACGTTATCC AAGTTGCCAC CTACGGTGAT TGCTGCCTGG CCGTGTCCGC 
951 CGACGGAGGA CTTTTTGGTT GGGGAAACTC GGAGTACCTG CAGCTGGCCT 
1001 CTGTCACTGA CTCCACACAG GTGAATGTGC CCCGCTGCTT ACACTTCTCA 
1051 GGAGTGGGGA AGGTGCGACA GGCTGCATGC GGTGGCACGG GCTGTGCAGT 
1101 GTTAAACGGA GAAGGACATG TTTTTGTCTG GGGCTATGGA ATTCTTGGGA 
1151 AAGGTCCAAA CCTAGTGGAA AGTGCCGTCC CTGAAATGAT TCCACCCACT 
1201 CTCTTTGGCT TGACGGAGTT CAACCCAGAA ATCCAGGTTT CCCGCATCCG 
1251 ATGTGGACTC AGCCACTTTG CTGCACTGAC CAACAAAGGA GAGCTGTTTG 
1301 TATGGGGCAA GAACATCCGA GGGTGCCTGG GAATCGGTCG CCTGGAGGAC 
1351 CAGTATTTCC CATGGAGGGT GACGATGCCT GGGGAGCCTG TGGACGTGGC 
1401 ATGTGGCGTG GACCACATGG TGACCCTGGC CAAGTCATTC ATCTAAACCT 
1451 CCCTCACCTG CTTGGGCGGC CCCGTCCCGG GAACCACTGG CACTCCTTGG 
1501 CAGAGGCCAG CGCGTGGCCA GCCCCCCGGG GTTCTTGGAT GGTGGTGGCG 
1551 GAGGACCCTG CGTGCAGTGT GACGCTCTGT CCTGAATCCC TTAGCGGGTA 
1601 CCTACCAGGA GGATCAGGGC AAGGTCCCTC TCCAGCTGCA GGTGAGGCCT 
1651 GCGGAACTCA GCTTGGATGG CAGCCTTTGG TGGGCCGCTG TGGCCCGCAC 
1701 GTCTCTGTTC TCTCCAAGTA ACATGCGACG GTGTCTGGTG TCACGTCTCG 
1751 CCTGAGAAGC CCGTCTTAGG AAAGCTTAGC TTGAACACAG TGCTCGGGAG 
1801 GTTTCTGCTC TGTCTGTCAT GGCAGTCTCT TGGTTTGTGT CTGGCCAAGG 
1851 CCATGCGTGT GCCTCGGACC GAGCCCCAGC TTAGGCGAGG GAGTCAGGCT 
1901 GGCTTCGGCC CTCGGTTTTC ATTCAGGCCA CCCTGCTCAT GGCCCTTCCT 
1951 GGCCGCCTGC CACACCGCAA GCTCGCTGGG GGGACACTAG AAGCACCGTG 
2001 GCCTGGGATT CCATCTGGAG CTGTCCGCAG GCACCAGCCC CAGCCTCCCA 
2051 CCACGCTCAC TGCCTGGCTT GGAAAAGTTA AGAAGCCCCT CAGGAAGAGA 
2101 ATCGAGGCTA AGTTCCTCTG CGCCGAGGGC CCCGAGCATA TCCGCCAAGG 
2151 CTCAGCTGCA GTGCCAGGCG GAGGAGGAAG ATCCAGAAAT TGTGAACAAT 
2201 GTTTGATTTA GTAGCGTGAC TTGCCTTTCC CTTTAAAAAC ATCTTTTACA 
2251 AATCTGTCTT GGAATAAAGT CTATTTTCTG CCTTTTGGTT TTTAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 
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Entry HS203358 from database EMBL: 
human STS SHGC-31781. 
Score - 1748, P - l.le-72, identities = 376/394 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 52 bp to 1443 bp; peptide length: 4 64 
Category: similarity to known protein 



1 MALVALVAGA RLGRRLSGPG LGRGHWTAAR RSRSRREAAE AEAEVPWQY 
51 VGERAARADR VFVWGFSFSG ALGVPSFWP SSGPGPRAGA RPRRRIQPVP 
101 YRLELDQKIS SAACGYGFTL LSSKTADVTK VWGMGLNKDS QLGFHRSRKD 
151 KTRGYEYVLE PSPVSLPLDR PQETRVLQVS CGRAHSLVLT DREGVFSMGN 
201 NSYGQCGRKV VENEIYSESH RVHRMQDFDG QVVQVACGQD HSLFLTDKGE 
251 VYSCGWGADG QTGLGHYNIT SSPTKLGGDL AGVNVIQVAT YGDCCLAVSA 
301 DGGLFGWGNS EYLQLASVTD STQVNVPRCL HFSGVGKVRQ AACGGTGCAV 
351 LNGEGHVFVW GYGILGKGPN LVESAVPEMI PPTLFGLTEF NPEIQVSRIR 
401 CGLSHFAALT NKGELFVWGK NIRGCLGIGR LEDQYFPWRV TMPGEPVDVA 
451 CGVDHMVTLA KSFI 



BLASTP hits 



Entry CEW09G3_5 from database TREMBLNEW: 

gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 

Score = 395, P » 9.3e-37, identities = 111/330, positives = 165/330 

Entry Y032_HUMAN from database SWISSPROT: 
HYPOTHETICAL PROTEIN KIAA0032. 

Score = 309, P - 1.0e-24, identities - 96/308, positives = 143/308 

Entry B38919 from database PIR: 
hypothetical protein 2 - human (fragment) 

Score = 309, P = 1.0e-24, identities = 96/308, positives = 143/308 
Entry AF060219_1 from database TREMBLNEW: 

product: "RCCl-like G exchanging factor RLG"; Homo sapiens RCCl-like G 
exchanging factor RLG mRNA, complete cds. 

Score = 273, P = 4.0e-21, identities 84/262, positives « 124/262 

Entry S71752 from database PIR: 
giant protein p619 - human 

Score = 282, P = l.le-19, identities = 86/287, positives = 144/287 



Alert BLASTP hits for DKFZphtes3_2ld4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_21d4, frame 1 



Report for DKFZphtes3_21d4 . 1 



[LENGTH] 4 64 

[MW] 49997.08 

[pi] 8.74 

(HOMOL] TREMBL:CEW09G3_5 gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 5e-34 

[FONCAT] 04.07 rna transport [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YGL097w] 2e-09 

[FUNCATJ 08.01 nuclear transport * [S. cerevisiae, YGL097wJ 2e-09 

[FUNCAT] 04.05.05 mrna processing (5* -end, 3' -end processing and mrna degradation) [S 
cerevisiae, YGL097w] 2e-09 

(FUNCAT) 04.01.04 rrna processing [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 04.03.03 trna processing [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YGL097w] 2e-09 
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(FUNCAT) 

[BLOCKS) 

[BLOCKS) 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 



[S. cerevisiae, YAL020c) 4e-06 



30.04 organization of cytoskeleton 
BL00870I 

BL00625B Regulator of chromosome condensation (RCC1) proteins 

BL00625A Regulator of chromosome condensation (RCC1) proteins 

blocked amino end 3e-16 

nucleus 3e-16 

duplication 4e-08 

tandem repeat 3e-16 

DNA binding 3e-16 

mitosis 3e-16 

leucine zipper 3e-21 

pheromone response pathway component SRMl 4e-08 

WD repeat homology 3e-21 

MYRISTYL 7 

RCC1_2 2 

AMI DAT I ON 2 

CAMP PHOSPHO SITE 1 

CK2_PHOSPHO_SITE 5 

TYR PHOSPHORITE 2 

GLYCOSAMINOGLYCAN 3 

PKC_PHOSPHO_SITE 7 

ASN~GLYCOSYLATION 2 

Regulator of chromosome condensation {RCC1) 
All_Beta 

LOW COMPLEXITY 13.58 % 



SEQ MALVALVAGARLGRRLSGPGLGRGHWTAARRSRSRREAAEAEAEVPVVQYVGERAARADR 

SEG .xxxxxxxxxxxxxxxxxxxxxxx. . . xxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhheeeccccccccchhhhhhhhhhhhhhhhhhhceeeeeehhhhhhhhh 

SEQ VFVWGFSFSGALGVPSFVVPSSGPGPRAGARPRRRIQPVPYRLELDQKISSAACGYGFTL 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD eeeeccccccccccceeeeeccccccccccccccccccccchhhhhhhheeeccccceee 

SEQ LSSKTADVTKVWGMGLNKDSQLGFHRSRKDKTRGYEYVLEPSPVSLPLDRPQETRVLQVS 

SEG 

PRD eecccccceeeeccccccccccccccccccccccceeeeeccccccccccccccceeeee 

SEQ CGRAHSLVLTDREGVFSMGNNSYGQCGRKVVENEIYSESHRVHRMQDFDGQVVQVACGQD 

SEG 

PRD cccceeeeeeccceeeeeccccccccccccccccccccccccccccccceeeeeeecccc 

SEQ HSLFLTDKGEVYSCGWGADGQTGLGHYNITSSPTKLGGDLAGVNVIQVATYGDCCLAVSA 

SEG 

PRD eeeeeecccceeeecccccccccccccccccccccccccccceeeeeeecccceeeeeec 

SEQ DGGLFGWGNSEYLQLASVTDSTQVNVPRCLHFSGVGKVRQAACGGTGCAVLNGEGHVFVW 

SEG 

PRD ccceeeeccccccccccccccccccccccccccccceeeeeccccceeeeeecccceeee 

SEQ GYGILGKGPNLVESAVPEMIPPTLFGLTEFNPEIQVSRIRCGLSHFAALTNKGELFVWGK 

SEG 

PRD cccccccccccccccccccccceeeeeeecccceeeeeeecccceeeeeecccceeeecc 

SEQ NIRGCLGIGRLEDQYFPWRVTMPGEPVDVACGVDHMVTLAKSFI 

SEG 

PRD cccccccccccccccccceeecccceeeeecccccccccccccc 
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PS00001 
PS00001 
PS00002 
PS00002 
PS00002 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



200->204 
268->272 
17->21 
B2->86 
333->337 
14->18 
34->37 
122->125 
147->150 
190->193 
2l9->222 
246->249 
410->413 
34->38 
147->151 
190->194 
290->294 
317->321 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

GLYCOSAMINOGLYCAN 

GLYCOSAMINOGLYCAN 

GLYCOSAMINOGLYCAN 

CAMP_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00002 
PDOC00002 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00007 


209->217 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


208->217 


TYR~PHOSPHO SITE 


PDOC00007 


PS00008 


9->15 


MYRISTYL 


PDOC00008 


PS00008 


20->26 


MYRISTYL 


PDOC00008 


PS00008 


' 133->139 


MYRISTYL 


PDOC00008 


PS00008 


238->244 


MYRISTYL 


PDOC00008 


PS00008 


277->283 


MYRISTYL 


PDOC00008 


PS00008 


302->308 


MYRISTYL 


PDOC00008 


PS00008 


344->350 


MYRISTYL 


PDOC00008 


PS00009 


12->16 


AMIDATION 


PDOC00009 


PS00009 


206->210 


AMIDATION 


PDOC00009 


PS00626 


179->190 


RCC1 2 


PDOC00544 


PS00626 


235->246 


RCC1 2 


PDOC00544 



Pfam for DKFZphtes3_21d4 . 1 



HMM_NAME 

HMM 

Query 



Regulator of chromosome condensation (RCC1) 



* I AaGqHHTVCLTqDGRVY tWG* 
+A GQ+H++ LT++G VY++G 
235 VACGQDHSLFLTDKGEVYSCG 



255 
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DKFZphtes3_21jl5 



group: transcription factors 

DKFZphtes3_21 j 15 encodes a novel 898 amino acid protein with similarity human NY-CO-33 
protein. 

NY-CO-33 is a protein recognised by autologous antibodies of human colon cancer patients. The 
novel protein contains 4 C2H2 Zinc fingers and is a new putativ transcription factor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



strong similarity to M NY-CO-33" 

complete cDNA, complete cds, potential start at bp 27, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 4407 bp 

Poly A stretch at pos. 4321, polyadenylation signal at pos . 4301 



1 CGCTGCAGCA GGTGTCACAG AGCCGCATGC TCCCGGAGCC CAGCCTCTTC 
51 AGCACCGTGC AGCTGTACCG GCAGAGCAGC AAGCTCTATG GCTCCATCTT 
101 CACGGGGGCC AGCAAGTTCC GCTGTAAGGA CTGCAGCGCT GCCTACGACA 
151 CCCTGGTGGA GTTGACAGTG CACATGAACG AGACGGGGCA TTACCGCGAC 
201 GACAACCATG AGACCGATAA CAACAACCCC AAGCGCTGGT CCAAGCCTCG 
251 CAAACGCTCC TTGCTGGAAA TGGAAGGGAA GGAAGACGCC CAGAAGGTGC 
301 TGAAGTGCAT GTACTGTGGC CACTCCTTTG AGTCCCTGCA GGATTTGAGT 
351 GTCCATATGA TCAAAACAAA ACACTACCAA AAAGTGCCTC TGAAGGAACC 
401 CGTCACTCCT GTCGCCGCCA AAATCATCCC TGCCACTCGG AAGAAAGCTT 
451 CCCTGGAGCT GGAGCTCCCC AGCTCCCCAG ATTCCACAGG TGGAACCCCC 
501 AAAGCCACCA TCTCAGACAC CAACGATGCA CTTCAGAAGA ACTCCAACCC 
551 TTACATCACG CCAAATAATC GGTACGGCCA CCAGAATGGG GCCAGCTATG 
601 CATGGCACTT TGAGGCCCGG AAGTCGCAGA TCCTGAAGTG CATGGAGTGT 
651 GGGAGCTCGC ATGACACCCT GCAGGAGCTC ACTGCCCACA TGATGGTCAC 
701 TGGCCACTTC ATCAAGGTCA CCAACTCTGC TATGAAAAAG GGGAAGCCCA 
751 TTGTGGAGAC GCCTGTCACA CCTACCATCA CAACCCTGCT GGATGAGAAG 
801 GTCCAGTCCG TGCCCCTGGC AGCCACCACC TTCACGTCCC CCTCCAATAC 
851 ACCTGCCAGC ATCTCCCCAA AACTGAATGT GGAGGTCAAG AAGGAAGTCG 
901 ACAAGGAGAA AGCGGTCACT GACGAGAAAC CTAAGCAAAA AGACAAGCCT 
951 GGCGAAGAAG AGGAGAAGTG TGACATCTCT TCCAAATACC ATTACTTGAC 
1001 TGAAAATGAC TTAGAAGAGA GTCCCAAGGG GGGGCTTGAT ATCCTCAAAT 
1051 CCTTGGAAAA CACAGTGACA TCCGCAATCA ACAAGGCCCA GAACGGCACT 
1101 CCTAGCTGGG GGGGCTATCC CAGCATCCAT GCCGCCTACC AACTTCCCAA 
1151 CATGATGAAG TTGTCCCTGG GCTCGTCGGG GAAGAGCACG CCCCTGAAAC 
1201 CCATGTTTGG CAACAGTGAG ATTGTCTCCC CGACGAAAAA CCAGACCCTG 
1251 GTCTCTCCAC CCAGCAGCCA GACGTCCCCC ATGCCCAAGA CAAACTTTCA 
1301 TGCCATGGAG GAGCTGGTGA AAAAGGTCAC TGAGAAAGTT GCCAAAGTGG 
1351 AGGAGAAGAT GAAGGAGCCG GATGGGAAGC TTTCCCCGCC CAAGCGGGCC 
1401 ACTCCCTCCC CATGTAGCAG CGAAGTCGGG GAACCCATCA AGATGGAGGC 
1451 ATCCAGCGAT GGGGGCTTCC GCAGCCAGGA GAACAGCCCC AGCCCCCCGC 
1501 GGGATGGGTG CAAGGATGGG AGCCCCCTCG CTGAGCCGGT GGAGAATGGC 
1551 AAGGAGCTGG TGAAGCCCCT AGCCAGCAGT TTGAGTGGCA GCACGGCCAT 
1601 CATCACCGAC CACCCGCCTG AACAGCCTTT TGTTAACCCT TTGAGCGCCC 
1651 TGCAGTCAGT CATGAACATT CACCTGGGCA AGGCCGCCAA GCCCTCCCTG 
1701 CCTGCCCTGG ACCCCATGAG CATGCTTTTC AAGATGAGCA ACAGCCTGGC 
1751 GGAGAAGGCT GCTGTGGCCA CCCCGCCGCC CCTGCAGTCC AAGAAGGCAG 
1801 ACCACCTCGA CCGCTATTTC TACCACGTCA ACAACGACCA GCCCATAGAC 
1851 TTGACAAAAG GGAAGAGTGA CAAAGGCTGC TCCTTGGGTT CAGTGCTTCT 
1901 GTCACCCACG TCCACAGCCC CGGCAACCTC CTCATCCACG GTGACAACGG 
1951 CAAAGACATC TGCCGTCGTA TCATTCATGT CAAACTCGCC GCTACGCGAG 
2001 AATGCCTTGT C AG AT AT AT C CGATATGCTG AAGAACTTGA CAGAGAGCCA 
2051 CACGTCAAAA TCCTCCACTC CTTCCAGCAT CTCCGAGAAG TCTGACATTG 
2101 ACGGGGCCAC TCTGGAGGAG GCTGAGGAGT CGACGCCCGC CCAGAAGAGG 
2151 AAGGGCCGCC AGTCAAACTG GAACCCCCAG CACCTCCTGA TCCTCCAGGC 
2201 CCAGTTTGCC GCCAGCCTCC GGCAGACCTC AGAAGGGAAG TACATCATGT 
2251 CAGACCTGAG CCCCCAGGAG CGGATGCATA TCTCCAGGTT CACCGGGCTG 
2301 TCCATGACCA CCATCAGCCA CTGGCTGGCC AACGTGAAAT ACCAGCTTCG 
2351 AAGGACAGGT GGAACAAAGT TCCTCAAAAA CTTGGACACT GGCCACCCCG 
2401 TCTTCTTTTG TAACGATTGT GCGTCCCAAA TCAGGACTCC TTCCACGTAC 
2451 ATCAGTCACC TAGAGTCACA CTTAGGCTTC CGGCTACGGG ACTTATCCAA 
2501 ACTGTCCACC GAACAGATTA ACAGTCAGAT AGCACAAACC AAGTCACCGT 
2551 CAGAAAAAAT GGTGACGTCC TCCCCCGAGG AAGACCTGGG GACTTCCTAT 
2601 CAGTGCAAAC TTTGCAATCG GACCTTTGCC AGCAAGCACG CTGTTAAACT 
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2651 TCACCTTAGC AAAACACACG 

2701 TCTCTGAGTT AGAGAAGCAG 

2751 GTTTGCTTTG AGGGAAACTG 

2801 GTTGTTCTTG GCACATGTTC 

2851 TGGACTGTTT TGTATAACTG 

2901 TGTTGTTACT GGTAAAATAT 

2951 AACTTTGTGT AAACGGGATT 

3001 CTGCATGCAT TAACAGACAG 

3051 ACACCTTTTC CACGAGACTC 

3101 TTTAGCCCTC TGAGTACTTT 

3151 ATAAAATAAA ATAATAATAA 

3201 CCAGCTTAGT TATAATGAAT 

3251 GGGGTATAAC ACGCCTTGAA 

3301 CATAGATGTA TATATTGTAT 

3351 TGATTGTGGT TAAATGACCT 

3401 TGCACCTGCT ATGCTCTGGG 

3451 TCTTTTTTTC TTTTTTTTAA 

3501 CATTGTAAAT TATACAGAAG 

3551 GAAACATTAT CTGAAAGCAA 

3601 ATCTATATTG ATAGAGGTTC 

3651 ATATTGTCAT TTGTTTTGAG 

3701 TCCCTGGCAG GCATCAGAAC 

3751 TTAGAAATCA AAGAACACTC 

3801 CTATTTGAAA AGGTTAAAAA 

3851 TGTATTTCCT AAACATTGAT 

3901 TTTGCTTAAA AGTCATGAGG 

3951 TATAAGCCCT CTTGGTTGCT 

4001 AATTGGTAAC TTTCTGTTTT 

4051 CACTGCAGCT TTATCTTTAG 

4101 TCTCCAAGTG ATTCTGTTCT 

4151 TAACTGACAG CTGACACCAT 

4201 TAAACAGCAC AGACACCGTA 

4251 AATGAAGCAC CATTATGTGA 

4301 CATTAAAATT GTCTTTTTGC 

4351 AAAAAAAAAA AAAAAAAAAA 

4401 AAAAAAA 



GGAAATCTCC GGAAGACCAC CTTCTGTATG 
TAGCATTTGC TTTTGATAGA AAGGACTGCA 
TGGAAGGCAC CTTCAGGCCC CCTCTGACTT 
TTATTTTAAC TGCAGAGAAT CACTCTGGGC 
TACAGTGTTT AATAGAGGTG CATAATCAGC 
GAAGGTTAAA ATGCAGTGGT AAGTGTTTGG 
TAGTTGTGAG CATCCTCCCG ATGCTTCAAG 
TTTAATTAAG CATTTATAAC GGAATCAGGC 
GAGTGTGCTG GCATTTCTCA CCCTTTCATC 
GAAGCACTTT TGCATTAATT TGGTTAAAAA 
TGTATGAAGC TCTGTTTTTT AAACTCCTTA 
AATATGAACC TCCATTTATG CAGGTCTGCA 
ATTTAAAAGA ATATTATTTT CACATTGAAA 
AGATTTCAGA CTCTCTTATG AAAAAAAATG 
TTTTCTTGCA TTTATAGCAA CAGTGTTTTA 
CATAAGCTGT GCCTATGTAT AGTGTATATT 
GGTCTATGGG TTTTGTTTTT TACATGCAAA 
ATACCACAGA TAGCATTTAT AAAGTATACA 
AGTATGATAG TTTGTTTTGC TATACAGTAC 
ATGTTTAAAT TATACATATT TATTAGCATC 
CAGTCTGAAT AAACGAGACC GGGAAAGACA 
TATTTTGCAC ATGATTTTTA AAGGTATTTA 
AAAATAAACT CAGTGCTCAA AGGGTTAAGT 
AAAGAACAAA AAAAAAAAAA GAACTTGTAC 
AAAGCCTTTA AAATGTTTGT ACTGTAATAC 
CATTCTGTGA TCCAACCTCT TTCACTTATT 
ATTCCATATT GTAGGATGCC TTTCTATTTC 
GTTCTTCCTA ATTATTCTCC CAAGATCCCA 
GCTTATGAAA GGTAACCCGT GGTTACCGGC 
TCTCCATTTT TGGCAGTTAA TTTGCAGAAG 
ATGAGAACCT TTGTATAAAA TATTGGCATG 
ACACACTCTG TGCCCTGTTT GGTTGTTGAC 
CTCTTCATAT AACCCTTTTT TCTACGGCAG 
TATAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 27 bp to 2720 bp; peptide length: 898 
Category: strong similarity to known protein 



1 MLPEPSLFST 
51 NETGHYRDDN 
101 FESLQDLSVH 
151 PDSTGGTPKA 
201 QILKCMECGS 
251 ITTLLDEKVQ 
301 KPKQKDKPGE 
351 INKAQNGTPS 
401 SPTKNQTLVS 
451 KLSPPKRATP 
501 LAEPVENGKE 
551 GKAAKPSLPA 
601 VNNDQPIDLT 
651 MSNSPLRENA 
701 ESTPAQKRKG 
751 HISRFTGLSM 
801 QIRTPSTYIS 
851 EEDLGTSYQC 



VQLYRQSSKL 
HETDNNNPKR 
MIKTKHYQKV 
TISDTNDALQ 
SHDTLQELTA 
SVPLAATTFT 
EEEKCDISSK 
WGGYPSIHAA 
PPSSQTSPMP 
SPCSSEVGEP 
LVKPLASSLS 
LDPMSMLFKM 
KGKSDKGCSL 
LSDISDMLKN 
RQSNWNPQHL 
TTISHWLANV 
HLESHLGFRL 
KLCNRTFASK 



YGSIFTGASK 
WSKPRKRSLL 
PLKEPVTPVA 
KNSNPYITPN 
HMMVTGHFIK 
SPSNTPASIS 
YHYLTENDLE 
YQLPNMMKLS 
KTNFHAMEEL 
IKMEASSDGG 
GSTAIITDHP 
SNSLAEKAAV 
GSVLLSPTST 
LTESHTSKSS 
LILQAQFAAS 
KYQLRRTGGT 
RDLSKLSTEQ 
HAVKLHLSKT 



FRCKDCSAAY 
EMEGKEDAQK 
AKIIPATRKK 
NRYGHQNGAS 
VTNSAMKKGK 
PKLNVEVKKE 
ESPKGGLDIL 
LGSSGKSTPL 
VKKVTEKVAK 
FRSQENSPSP 
PEQPFVNPLS 
ATPPPLQSKK 
APATSSSTVT 
TPSSISEKSD 
LRQTSEGKYI 
KFLKNLDTGH 
INSQIAQTKS 
HGKSPEDHLL 



DTLVELTVHM 
VLKCMYCGHS 
ASLELELPSS 
YAW HFE ARKS 
PIVETPVTPT 
VDKEKAVTDE 
KSLENTVTSA 
KPMFGNSEIV 
VEEKMKEPDG 
PRDGCKDGSP 
ALQSVMNIHL 
ADHLDRYFYH 
TAKTSAVVSF 
IDGATLEEAE 
MSDLSPQERM 
PVFFCNDCAS 
PSEKMVTSSP 
YVSELEKQ 



B LAS TP hits 



No BLAST P hits available 
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Alert BLASTP hits for DKFZphtes3_21 j 15, frame 3 

TREMBL:AF039698_1 gene: "NY-CO-33" ; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds., N = 1, Score « 
1039, P = 5.5e-105 

PIR:A38437 probable homeotic protein tsh - fruit fly (Drosophila 
melanogaster) , N - 3, Score « 158, P = 7.2e-09 

TREMBL:CE33058_1 gene: "unc-89"; product: "UNC-89"; Caenorhabditis 
elegans UNC-89 (unc-89) gene, complete cds., N = 2, Score =» 175, P = 
3.3e-07 



>TREMBL:AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds. 
Length = 687 

HSPs: 



Score - 1039 (155.9 bits), Expect ~ 5.5e-105, P = 5.5e-105 
Identities « 244/504 (48%), Positives - 319/504 (63%) 



Query: 


170 


QKNSNPYITPNNRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI 


229 






QK +NPY+TPNNRYG+QNGASY W FEARK+QILKCMECGSSHDTLQ+LTAHMMVTGHF+ 




Sb j ct : 


1 A 






Query: 


230 


KVTNSAMKKGKPIVETPVTPTITTLLDEKVQSVPLAATTFTS-PSNT PASISPKLN 


284 






KVT SA KKGK +V PV ++EK+QS+PL TT T P+++ P S + 




Sbjct: 


74 




19c 


Query: 


285 


VEVKKEVDKEKA-VTDEKPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSL 


343 






E KKE +KEK V + K K++ + EK + S+ Y YL E DL++SPKGGLDILKSL 




Sbjct: 


127 


SEEKKEPEKEKPPVAGDAEKIKEESEDSLEKFEPSTLYPYLREEDLDDSPKGGLDILKSL 


186 


Query: 


344 


ENTVTSAINKAQNGTPSWGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMF-GNSEIVSP 


402 






ENTV++AI+KAQNG PSWGGYPSIHAAYQLP +K L ++ +S ++P + G + +S 




Sbjct: 


187 


ENTVSTAISKAQNGAPSWGGYPSIHAAYQLPGTVK-PLPAAVQSVQVQPSYAGGVKSLSS 


245 


Query: 


403 


TKNQTLVSPPSSQTSPMPKTNFHAMEELVKKVTEKV-AKVEEKMKEPDGKLSPPKRATPS 


461 






++ L+ P S T P K+N AMEELV+KVT KV K EE+ E + K S K A S 




Sbjct: 


246 


AEHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKE-KSSLAKAA— S 


302 


Query: 


462 


PCSSEVGEPIKMEASSDGGFRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSG 


521 






P + E + K E S + Q+ P K PL NG E +K ++ 




Sbjct: 


303 


PIAKENKDFPKTEEVSG KPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCN 


359 


Query: 


522 


STAIITDHPPEQPFVKPLSALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVA 


581 






+ II DH PE F+NPLSALQS+MN HLGK +KP P+LDP++ML+K+SNS+ +K 




Sbjct: 


360 


NLGI IMDHSPEPSFINPLSALQSIMNTHLGKVSKPVSPSLDPLAMLYKISNSMLDKPVYP 


419 


Query: 


582 


TPPPLQSKKADHLDRYFYHVNNDQPIDLTKGKSDK-GCSLGSVLLSPTSTAPATSSSTVT 


640 






P K+AD +DRY+Y N+DQPIDLTK K+ S+ + SP + S + 




Sbjct: 


420 


ATPV KQADAIDRYYYE-NSDQPIDLTKSKNKPLVSSVADSVASPLRESALMDISDMV 


475 


Query: 


641 


TAKTSAVVSFMSN-SPLRENALSDISDMLKNLTE 673 








T+ SS + E + +DS +LE 




Sbjct: 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDE 509 




Score 


= 865 


(129.8 bits), Expect - 7.4e-95, P - 7.4e-95 




Identities = 211/434 (48%), Positives - 268/434 (61%) 




Query: 


447 


EPDGKLSPPKRATPSPCSSEVG— EPIKMEASSDGGFRSQENSPSPPRDG-CKDGSPLAE 


503 






E+LP TPPSV E+++ ++EP + K SP+A+ 




Sbjct: 


247 


EHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKEKSSLAKAASPIAK 


306 


Query: 


504 


P-VE— NGKELVK-PLASSLSGSTAIITD-HPPE— QPFVNPLSALQSVMNIHLG 


551 






P E +GK KPA+ OHP +P ++ + + I + 




Sbjct: 


307 


ENKDFPKTEEVSGKPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCNNLGIIMD 


366 


Query: 


552 


KAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYHVNN DQPID 


608 






+ +PS ++P+S L+N+ K+ PL DL Y ++N D+P+ 




Sbjct: 


367 


HSPEPSF--INPLSALQSIMNTHLGKVSKPVSPSL DPL-AMLYKISNSMLDKPV- 


417 


Query: 


609 


LTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENALSDISDML 


668 






K S P + + S+V ++ SPLRE+AL DISDM+ 




Sbjct: 


418 


-YPATPVKQADAIDRYYYENSDQPIDLTKSKNKPLVSSVADSVA-SPLRESALMDISDMV 


475 


Query: 


669 


KNLTESHTSKSSTPSSISEKSDIDGATLEEA-EESTPAQKRKGRQSNWNPQHLLILQAQF 


727 






KNLT T KSSTPS++SEKSD DG++ EEA +E +P KRKGRQSNWNPQHLLILQAQF 
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Sbjct: 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDELSPVHKRKGRQSNWNPQHLLILQAQF 


535 


Query: 


728 


AASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 


787 






A+SLR+T+EGKYIMSDL PQER+HIS+FTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 




Sbjct: 


536 


ASSLRETTEGKYIMSDLGPQERVHISKFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 


595 


Query: 


788 


TGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKSPSEKMV- 


846 






TGHPVFFCNDCASQ RT STYISHLE+HLGF L+DLSKL QI Q +K + K + 




Sbjct: 


596 


TGHPVFFCNDCASQFRTASTYISHLETHLGFSLKDLSKLPLNQIQEQQNVSKVLTNKTLG 


655 


Query: 


847 


-TSSPEEDLGTSYQCKLCNRTFASK 870 








+ EEDLG+++QCKLCNRTFA + 




Sbjct: 


656 


PLGATEEDLGSTFQCKLCNRTFAKQ 680 




Score 


- 98 


(14.7 bits), Expect = 7.4e-95, P = 7.4e-95 




Identities ■■ 


= 32/95 (33%), Positives - 47/95 (49%) 




Query: 


90 


KVLKCMYCGHSFESLQDLSVHMIKTKHYQKVPL KEPVT- PVAAKI I PATRKKAS 


142 




++LKCM CG S ++LQ L+ HM+ T H+ KV K+ V PV + I + + 




Sbjct: 


45 


QI LKCMECGSSHDTLQQLTAHMMVTGHFLKVTTSASKKGKQLVLDPVVEEKIQSI PLPPT 


104 


Query: 


143 


LELELPSS PDSTGGTPKATI SDTNDALQKNSNP 175 








LP+S PDS G+ T S+ +K P 




Sbjct: 


105 


THTRLPASS I KKQPDS PAGS TTSEEKKEPEKEKPP 139 




Score 


= 81 


(12.2 bits), Expect = 4.6e-93, P = 4.6e-93 




Identities ■ 


= 13/29 (44%), Positives = 20/29 (68%) 




Query: 


28 


ASKFRCKDCSAAYDTLVELTVHMNETGHY 56 








A +C +C +++DTL +LT HM TGH+ 





Sbjct: 44 AQILKCMECGSSHDTLQQLTAHMMVTGHF 72 



Pedant information for DKFZphtes3_21 j 15, frame 3 



Report for DKFZphtes3_21jl5 . 3 



[LENGTH] 


898 


[MW] 


98486.72 


IpU 


8.61 


[HOMOL] 


TREMBL : AF039698 1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo sapiens 


antigen NY-CO- 


■33 (NY-CO-33) mRNA, complete cds. 0.0 


[ BLOCKS] 


BL00028 Zinc finger, C2H2 type, domain proteins 


[PIRKW] 


zinc finger le-06 


[PIRKW] 


DNA binding le-06 


[PIRKW] 


transcription regulation le-06 


[PROSITE] 


MYRISTYL 9 


(PROSITE] 


ZINC FINGER C2H2 4 


[PROSITE] 


CAMP PHOSPHO SITE 5 


[ PROSITE) 


CK2 PHOSPHO SITE 19 


[PROSITE] 


TYR PHOSPHO SITE 2 


[PROSITE] 


PKC~PHOSPHO SITE 15 


[PROSITE] 


ASN_GLYCOSYLATION 4 


[PFAM] 


Zinc finger, C2H2 type 


[KW] 


Alpha Beta 


IKW] 


LOW COMPLEXITY 11.36 % 



SEQ MLPEPSLFSTVQLYRQSSKLYGSIFTGASKFRCKDCSAAYDTLVELTVHMNETGHYRDDN 

SEG 

PRD ccccceeeeeeeeccccceeeeeeeccccceeecccchhhhhhhhhhhcccccccccccc 

SEQ HETDNNNPKRWSKPRKRSLLEMEGKEDAQKVLKCMYCGHSFESLQDLSVHMIKTKHYQKV 

SEG 

PRD cccccccccccccccchhhhhhhccchhhhhhhhhcccccchhhhheeeeeeeecceeee 

SEQ PLKEPVTPVAAKI I PATRKKAS LELELPSS PDSTGGTPKATI SDTNDALQKNSNP YITPN 

SEG xxxxxxxxxx 

PRD eccccccceeeeeeehhhhhhhhhhcccccccccccccceeeeccchhhhhccccccccc 

SEQ NRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI KVTNSAMKKGK 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhceeeeeccccccccc 

SEQ PIVETPVTPTITTLLDEKVQSVPLAATTFTSPSNTPASISPKLNVEVKKEVDKEKAVTDE 

SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhccccccccccccccccccccccccccccccccchhhhhhcc 

SEQ KPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSLENTVTSAINKAQNGTPS 
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SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



ccccccccccccccchhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhcccccc 

WGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMFGNSEIVSPTKNQTLVSPPSSQTSPMP 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

KTNFHAMEELVKKVTEKVAKVEEKMKEPDGKLSPPKRATPSPCSSEVGEPIKMEASSDGG 

xxxxxxxxxxxxxxxxxxxx 

ccchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeecccc 

FRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSGSTAIITDHPPEQPFVNPLS 

cccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccc 

ALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYH 

chhhhhhcccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccceeee 

VNNDQPIDLTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENA 

xxxxxxxxxxxxxxxxxxxxxxxx 

ecccccceeecccccccccccceeecccccccccccceeeeceeeeeeeeccccccchhh 

LSDISDMLKNLTESHTSKSSTPSSISEKSDIDGATLEEAEESTFAQKRKGRQSNWNPQHL 

xxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhcccccccccccceeecccccchhhhhhhhccchhhhhhcccccccchh 

LI LQAQFAASLRQTSEGKYIMSDLSPQERMHI SRFTGLSMTTI SHWLANVKYQLRRTGGT 

hhhhhhhhhhhhhccccceeecccccchhhhhhhhccccchhhhhhhhhhhhhhhhcccc 

KFLKNLDTGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKS 

ceeecccccccceeecccceeeecccchhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcc 

PSEKMVTSSPEEDLGTSYQCKLCNRTFASKHAVKLHLSKTHGKSPEDHLLYVSELEKQ 

ccceeeeccccccccceeehhhhhhhhhhhhhhhhhccccccccccceeeeeeecccc 



Prosite for DKFZphtes3_21jl5 .3 



PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSO0005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0OO6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00O06 
PS00006 



51->55 
405->409 
670->674 
864->8S8 
69->73 
75->79 
139->143 
432->436 
456->460 
17->20 
137->140 
157->160 
280->283 
318->321 
332->335 
384->387 
435->438 
588->591 
614->617 
641->644 
676->679 
686->689 
730->733 
842->845 
42->46 
78->82 
103->107 
149->153 
161->165 
210->214 
214->218 
253->257 
325->329 
573->577 
684->688 
689->693 
695->699 
745->749 



ASN_GLYCOSYLATION 
AS N_G L YCOS YLAT I ON 
ASN G LYCOS YLAT I ON 
ASNJ3LYCOSYLATION 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 

pkc_phos pho_s ite 
pkc phospho_site 
pkc"phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc phospho_site 
pkxTphospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
pkc_phospho_site 
ck2_phospho site 
ck2_ph0sph02site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho site 
ck2_phospho~site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2 phospho site 



PDOC0O001 - 

PDOC00001 

PDOC00001 

PDOC00001 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC0O004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC000O5 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 
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PS00006 


810- 


>814 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


840- 


>844 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


848- 


■>852 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


884- 


>888 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


893- 


>897 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


732- 


■>740 


TYR PHOSPHO"SITE 


PDOC00007 


PS00007 


883- 


>892 


TYR PHOSPHO 


site 


PDOC00007 


PS00008 


22 


!->28 


MYRISTYL 




PDOC00008 


PS00008 


156- 


■>162 


MYRISTYL 




PDOC00008 


PS00008 


188- 


■>194 


MYRISTYL 




PDOC00008 


PS00008 


362- 


■>368 


MYRISTYL 




PDOC00008 


PS00008 


479- 


•>485 


MYRISTYL 




PDOC00008 


PS00008 


494- 


■>500 


MYRISTYL 




PDOC00008 


PS00008 


498- 


•>504 


MYRISTYL 




PDOC00008 


PS00008 


617- 


■>623 


MYRISTYL 




PDOC00008 


P500008 


757- 


■>763 


MYRISTYL 




PDOC00008 


PS00028 


795- 


■>816 


ZING FINGER 


C2H2 


PDOC00028 


PS00028 


860->882 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


33->56 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


94->117 


ZINC FINGER" 


"C2H2 


PDOC00028 



Pfam for DKFZphtes3_21 j 15 . 3 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C++ C ++ + +L+ HM+ H 
Query 33 CKD— CSAAYDTLVELTVHMNET-GH 



55 



26.69 (bits) f: 94 t: 116 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33 ,, 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + CG +F + +L HM+ H 

dkfzphtes3 94 CMY — CGHSFESLQDLSVHMIKT-KH 116 

Query f: 795 t: 815 Target: dkf zphtes3_21 j 15 . 3 strong similarity to M NY-CO-33" 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C R++S+++ H+ +H 
Query 795 CND — CASQIRTPSTYISHLESH 815 

27.12 (bits) f: 860 t: 881 Target: dkf zphtes3_21 jl5 .3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR.T .H* 

C+ C++TF +++ + H+ H 

dkfzphtes3 860 CKL— CNRTFASKHAVKLHLSK-TH 881 
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group: intracellular transport and trafficking 

DKFZphtes3_21116 encodes a novel 66 amino acid protein nearly identical to rat ribosome 
attached membrane protein 4 (ramp4) . 

The novel protein seems to be the human orthologe of rat ramp 4 . Ramp4 is involved in the 
regulation of translocation of proteins into endoplasmic reticulum, e.g. of the MHC class II 
associated invariant (gamma) chain. 

The new protein can find application in modulation of protein translocation into the 
endoplasmic reticulum. 



identical to rat ribosome attached membrane protein 4 

ORF Bp 316-513 (66 aa) see BLASTX 

Sequenced by LMU 

Locus : unknown 

Insert length: 2488 bp 

Poly A stretch at pos. 2464, polyadenylation signal at pos. 2442 



1 CTTCCTCTTT CACTCCGCGC 
51 CGGCGCGAGA ACGACCCGGC 
101 CCGCTCGGTC AGTCAGTCGG 
151 GCGCTTGCGG CGCCCAGGCC 
201 ACCTCGGCGC TCCGGCGGCG 
251 TCCAGAGGAG GCAGGCGAGT 
301 GGTGGCGCCG CGAAGATGGT 
351 GAAGCACAGC AAGAACATCA 
401 GAAATGCCCC CGAAGAGAAG 
451 TTCATTTTTG TTGTCTGTGG 
501 CAGGATGGGC ATGTGAAGTG 
551 GAATTTTAAC TTGAACTCAT 
601 ATTCAGTAAA GCATCCTGCC 
651 GTCATTCCAA GGTTTCTTCA 
701 ACAGTGCCTT GCAAAAAACA 
751 TTAAGATACA GTAGTGGACC 
801 TTTTATGTGG TTATTAAAAC 
851 ACAGGGTCTA GATTTTGTTA 
901 TTACAATTTG AAGTCTTGTG 
951 TTTTGAACTG AAAGCACACT 
1001 TAAGGTGCTT ATAAATGGAA 
1051 TTAGCATCTA AAAAGTTTTA 
1101 ATGCTTATAG CCACAACATC 
1151 CCTTGGATTT TGCATGAGTG 
1201 AACTTGATCG TTTTCTGACT 
1251 ACCGTGGTGG AGTGAAGTCA 
1301 TTTCACCAGA ACTATTTTAA 
1351 AATTCTAGGG AAAAATACTG 
1401 GTTGAGTCCA ATGTGCCATA 
1451 AATAGCAAAA AAAGGCACAT 
1501 GCTTTTTCTA GATTAATGAT 
1551 GCCTAAAGTG GCATCTGGAA 
1601 TTAGTCTTCC CTTTGTTATA 
1651 ACGTTTTACT AATGGTAAGG 
1701 CTAGTACTGT TGAAAACTGC 
1751 ACTTGGTGAA AAAAAACCTG 
1801 GAAAGCTGCT TGTGTTTGCT 
1851 AATAAGCTGT TTTAAGAGGA 
1901 CACAGCGTGA ACCTCACAGG 
1951 AGTAAGGGAG CAGAGTGGTT 
2001 ATAAGGAATG AATCAACTGA 
2051 TTACTTGCCT TTCTCACCCA 
2101 TTGAAACAAG TGTCTTGGTT 
2151 TCATAGCAGG TGCCTTATTC 
2201 AATTTTCCTT GGTTTACTAT 
2251 TTTTTAATGT ACAATGTTCT 
2301 AGCAATCATT TTACATATGT 
2351 GTAATTCACC AATTAAGTGC 
2401 TAGGTAAACG AAAGCTGTGT 
2451 TCCCTGAATA TTTGAAAAAA 



TCACGGCGGC GGCCAAAGCG GCGGCGACGG 
GGCCAGTTCT CTTCCTCCTG CGCACCTGCC 
CGGCCGGCGC CCGGCTTGTG CTCAGACCTC 
CAGCGGCCGT AGCTAGCGTC TGGCCTGAGA 
CGGGCACCAC GAGCCGAGCC TCGCAGCGGC 
GAGCGAGTCC GAGGGGTGGC CGGGGCAGGT 
CGCCAAGCAA AGGATCCGTA TGGCCAACGA 
CCCAGCGCGG CAACGTCGCC AAGACCTCGA 
GCGTCTGTAG GACCCTGGTT ATTGGCTCTC 
TTCTGCAATT TTCCAGATTA TTCAAAGTAT 
ACTGACCTTA AGATGTTTCC ATTCTCCTGT 
TCCTGATGTT TGATACCCTG GTTGAAAACA 
TCAGAATGAC TTTCCTATCA TGCTTCATGT 
TGAGTCATTC CAAGTTTTCT AGTCCATACC 
CCACATGAAT AAAGCAATAA AATTTGATTG 
CTACTTATTC AGTCAATTAA GAGTAAGTTT 
AGTATGAACA ATTAGTCTAA CTCTGCATAG 
ACCCAAATGT ATAACTGCAG TTAGCTTAAA 
GTTTTTATAT AGCTAGGCAC TTTATTACTC 
CCCTTATAGG TTCATGTAAC TGTCCTGTAA 
CAACTACACA GCCTAGTTTT GCCACAACCT 
AAAGCTTCTA AATGTCTAAT ATAAAGGGAG 
TATTTTACCA ATATTGTTTC CATTACACTA 
AGTATAGTAA CCCAAGATGC CATAAAAAAA 
TAATTAGTTA CTGTGGTTTC ACTAAAAGCT 
GTCAGGGAAG GTTTGTTTAT GTTACATTTA 
TATATCAAAG GGGTTTACTA TGCCAAACAA 
CTAAAAATGG ATGCCTCATC AGAACATGCT 
AGACATTTTA GCATGTTAAA TAGCACTTTT 
CAACTGCGAA GTTATCCTTA GTTTGCAAAT 
TTTTCAATCA TTAGGGTACT AGACACATCA 
TTGAATGGAT TTACTGATAA TGATCAGTCT 
TGACTTTATA GGTTATGATT GATCAAATTT 
GTGAGGGTCA TAGGGCAGGT TTTGGGTTTT 
AAGTATTGGC TATTTGTATA CTTAGCCATA 
AGCAGTGTCT ATGTATTAAT GCGTTGGAAA 
TTGTTAATTG CCTCAGGATA TTTCTTTTAA 
ACAGAAGGGA AATCTGCTAC CTAGTCTATA 
GGGCTTCTGA TACCCTCAAA CATGGAGAAC 
AAGGACTTTC AGGAACTTAA CTATTCTGGA 
CCTTGGGCCA GCAGGTTTTT AACTAAATTG 
GTTAATCAGT CTCTGTACTT GTTTCCCTTT 
AACTAATTCT GTTTTATGGT TGTGCTAAAT 
TTTGCTTTTA GTCAAACCAT TCCATATCAG 
AGATATTTGG CTTTAAGTTG TTGTTTGTGT 
GATAAATTTG ACTGTTAAAT TGCTATAGCT 
AAAAAATTGC ATTCCCTTTG TATTTCATGT 
AGTTTATATT CAGGTTGGAT TATGCATGTT 
CTTACTTGAT TTATTCTTTA AAAATAAAGT 
AAAAAAAAAA AAAAAAAA 
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BLAST Results 



Entry HSCDN13 from database EMBL: 

H. sapiens (TL5) mRNA from LNCaP cell line 

Score = 1075, P = 5.8e-41, identities = 219/221 

Entry AF100470_1 from database TREMBLNEW: 

gene: "RAMP4"; product: "ribosome attached membrane protein 4"; Rattus 
norvegicus ribosome attached membrane protein 4 (RAMP4) mRNA, complete 
cds . 

Score = 331, P = 3.9e-28, identities = 66/66, positives = 66/66, frame 
+1 

Entry HSG19910 from database EMBL: 
human STS A002B48. 
Score = 530, P = 2.1e-17, identities = 108/109 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 316 bp to 513 bp; peptide length: 66 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 



1 MVAKQRIRMA NEKHSKNTTQ RGNVAKTSRN APEEKASVGP WLLALFIFVV 
51 CGSAIFQIIQ SIRMGM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21116, frame 1 

TREMBLNEW :RN02 3 82 3 6_1 gene: "ramp4"; product: "ribosome associated 
membrane protein RAMP4"; Rattus norvegicus mRNA for ribosome 
associated membrane protein RAMP4, N - 1, Score » 331, P = 6.2e-30 

TREMBL:AF100470_ 1 gene: "RAMP4"; product: "ribosome attached membrane 
protein 4"; Rattus norvegicus ribosome attached membrane protein 4 
(RAMP4) mRNA, complete cds., N = 1, Score « 331, P = 6.2e-30 



>TREMBLNEW:RN0238236_1 gene: "ramp4"; product: "ribosome associated membrane 
protein RAMP 4 " ; Rattus norvegicus mRNA for ribosome associated membrane 
protein RAMP4 

Length = 75 

HSPs: 

Score = 331 (49.7 bits), Expect » 6.2e-30, P - 6.2e-30 
Identities * 66/66 (100%), Positives = 66/66 (100%) 

Query: 1 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 60 

MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 
Sbjct: 10 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 69 

Query: 61 SIRMGM 66 

SIRMGM 
Sbjct: 70 SIRMGM 75 

No Pedant data available 
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DKFZphtes3_21n23 



group: testes derived 

DKFZphtes3__15jl8 encodes a novel 148 amino acid protein with strong similarity to rat 7acomp 
protein. ~~ 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific . 
genes. 



strong similarity to rat 7acomp protein 
on genomic level encoded by AF107885 
Sequenced by LMU 
Locus: /map~ M 14q24 . 3" 
Insert length: 3122 bp 

Poly A stretch at pos. 3070, polyadenylation signal at pos. 3045 



1 GGAAAACCTC GTGGGCTCAG CCCGGGAGAA AGGGCCAGGG AAGTTGGGTG 
51 GTTCTGTGCT TGGTCTGTCA ATGGAGGAGA TCAAAGTTTT ACGAAGGGTG 
101 AAGGAGGAGA ATGATCGGCG AGGTGGATTT ATTCGCATAT TTCCTACATC 
151 TGAGACATGG GAAATATATG GGTCCTACCT CGAGCATAAG ACCTCAATGA 
201 ACTATATGCT GGCAACACGC CTCTTCCAGG ACAGGGGAAA CCCAAGAAGA 
251 AGCTTATTGA CAGGAAGAAC ACGAATGACT GCTGATGGAG CGCCAGAATT 
301 GAAGATAGAG AGTCTGAATT CAAAGGCCAA GCTGCATGCT GCACTTTACG 
351 AGAGGAAGCT CCTGTCTCTG GAGGTGCGAA AACGTAGACG ACGGAGTAGC 
401 AGATTGAGGG CAATGAGGCC AAAATACCCA GTGATTACCC AACCAGCTGA 
451 AATGAATGTT AAAACTGAGA CAGAGAGTGA AGAGGAGGAA GAAGTCGCAT 
501 TAGATAATGA AGATGAAGAA CAGGAGGCTT CCCAGGAGGA GTCTGCAGGA 
551 TTTCTTAGAG AAAATCAAGC CAAATATACA CCCTCATTGA CAGCTTTGGT 
601 AGAAAATACA CCCAAAGAAA ATTCCATGAA AGTTCGTGAA TGGAATAATA 
651 AAGGTGGACA CTGCTGCAAA CTTGAGACTC AGGAGCTAGA GCCTAAATTT 
701 AACCTGATGC AGATTCTTCA AGATAATGGC AATCTTAGCA AAATGCAGGC 
751 CCGAATAGCA TTCTCTGCCT ATCTCCAGCA TGTTCAAATT CGCCTGATGA 
801 AAGACAGTGG CGGTCAGACG TTCAGTGCCA GTTGGGCTGC CAAAGAGGAT 
851 GAACAGATGG AGCTGGTTGT TCGTTTCCTC AAGCGAGCAT CAAATAACCT 
901 CCAGCATTCA CTGAGGATGG TATTACCCAG TCGACGATTG GCACTTCTGG 
951 AACGCAGAAG AATCCTGGCC CACCAGCTGG GTGACTTTAT CATTGTATAC 
1001 AACAAGGAAA CAGAACAAAT GGCTGAAAAG AAATCAAAGA AGAAAGTTGA 
1051 GGAAGAAGAG GAAGATGGGG TGAATATGGA AAACTTTCAG GAGTTCATCA 
1101 GACAAGCAAG TGAGGCTGAA CTGGAGGAGG TGTTGACTTT TTATACCCAA 
1151 AAGAACAAGT CTGCTAGTGT CTTCCTGGGG ACTCACTCTA AAATTTCTAA 
1201 GAACAACAAC AATTATTCTG ATAGTGGGGC AAAAGGTGAT CACCCTGAGA 
1251 CTATAATGGA AGAAGTGAAA ATAAAGCCAC CTAAACAGCA ACAGACGACA 
1301 GAAATTCATT CTGATAAATT ATCTCGATTT ACCACTTCAG CAGAAAAAGA 
1351 GGCAAAATTA GTTTATAGCA ATTCCTCCTC TGGTCCTACT GCTACTCTGC 
1401 AGAAAATTCC CAACACCCAT TTGTCATCTG TTACAACCTC TGACCTCTCT 
1451 CCAGGGCCTT GCCACCATTC TTCTTTATCT CAAATTCCTT CAGCTATCCC 
1501 CAGCATGCCT CACCAGCCAA CAATTTTACT GAACACAGTC TCTGCCAGTG 
1551 CTTCTCCCTG CCTACATCCC GGGGCACAGA ACATCCCAAG CCCTACTGGC 
1601 CTGCCACGCT GTCGATCAGG AAGTCACACC ATTGGTCCCT TTTCTTCCTT 
1651 CCAAAGTGCT GCACACATCT ATAGCCAGAA ACTGTCTCGT CCCTCTTCAG 
1701 CAAAGGCAGG ATCGTGCTAT CTAAACAAGC ATCATTCAGG AATAGCCAAA 
1751 ACACAAAAAG AGGGAGAAGA TGCTTCTTTA TATAGCAAAC GGTACAACCA 
1801 AAGTATGGTT ACAGCTGAAC TTCAGCGGCT AGCTGAGAAG CAGGCAGCGA 
1851 GACAGTATTC TCCATCCAGC CACATCAACC TCCTCACCCA ACAGGTAACA 
1901 AACCTGAATT TGGCAACTGG CATCATAAAC AGAAGCAGTG CTTCAGCTCC 
1951 CCCAACCCTC CGACCCATCA TCAGTCCTAG TGGCCCGACA TGGTCTACAC 
2001 AGTCAGACCC CCAAGCTCCC GAGAATCACT CCAGCTCTCC TGGAAGCAGG 
2051 AGCCTGCAGA CAGGGGGATT TGCCTGGGAA GGAGAAGTAG AAAACAACGT 
2101 GTACAGCCAG GCTACAGGGG TGGTCCCCCA GCACAAGTAT CACCCCACAG 
2151 CAGGCAGCTA TCAGCTTCAA TTTGCCCTGC AGCAACTTGA ACAACAAAAA 
2201 CTTCAGTCCC GGCAGCTCCT GGACCAGAGT CGAGCCCGGC ACCAGGCAAT 
2251 CTTTGGCAGC CAGACACTAC CTAACTCCAA TTTATGGACA ATGAATAATG 
2301 GTGCAGGTTG TAGAATTTCC AGTGCCACAG CTAGTGGCCA GAAGCCAACC 
2351 ACTCTGCCAC AAAAAGTGGT ACCACCTCCA AGTTCTTGCG CCTCCCTGGT 
2401 TCCCAAACCC CCACCCAACC ACGAACAAGT GCTCAGAAGG GCAACATCCC 
2451 AGAAAGCTTC CAATACCCGC TTCAGATCCT CCTTTCAAAA CTATTTGTGG 
2501 TATTTCTTCC AAGCAGTCAG CTGAACTGAG GACGACAGCC TACAAACAAC 
2551 TACATGCATC TGAACTGTCT CTTGTAAATG AGCTTTTTTC AGAGCCAGAA 
2601 TCATACTCTC CAGGAAATAT GGAGAAAGAA ACCTGAGGAG ATTGAAGTTT 
2651 GCCAGGCACA AGGGCAAAAC TCAGACTGAA TGAATTTGAA AGGGTGGGGC 
2701 CAAAGATGTT GTAACCTGGG AGACTTCTCT GAAGAAAGAA AACTGTTTAA 
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2751 GAAACACAGA CTGAACTGCA GTACTTTTCC TTAAATAGCT GAGATGACCT 
2801 TCTTTACCCT GGGCTTAGGT GATTCTCATC AGGGTGACCT GAGTGGAAGT 
2851 TGGTGGTAAC GACTGTTCTG TGTCAGCACC CAGGACAGTG GTGTCTGTTA 
2901 AGGCTGCCAG GGATTAGCAG GGAGGAAAGC CATCAGGACT GGGTAGCCTG 
2951 GTAGCACCAA ATCCCAATTA ATGTTACCTG AACATGTGGT GAGGTCAGCC 
3001 GTATGATGAA AGATGTTTAA GAGATTAATG TCAGAAGAAT ATGAAAATAA 
3051 ACACCGGCTT AAAAAATGTT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AF107885 from database EMBL: 

Homo sapiens chromosome 14q24.3 clone BAC270M14 transforming growth 
factor-beta 3 (TGF-beta 3) gene, complete cds; and unknown genes. 
Score = 3042, P = 3.0e-219, identities = 610/612 
5 exons matching 1893-3070 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 71 bp to 2521 bp; peptide length: 817 
Category: strong similarity to known protein 



1 MEEIKVLRRV KEENDRRGGF IRIFPTSETW EIYGSYLEHK TSMNYMLATR 

51 LFQDRGNPRR SLLTGRTRMT ADGAPELKIE SLNSKAKLHA ALYERKLLSL 

101 EVRKRRRRSS RLRAMRPKYP VITQPAEMNV KTETESEEEE EVALDNEDEE 

151 QEASQEESAG FLRENQAKYT PSLTALVENT PKENSMKVRE WNNKGGHCCK 

201 LETQELEPKF NLMQILQDNG NLSKMQARIA FSAYLQHVQI RLMKDSGGQT 

251 FSASWAAKED EQMELVVRFL KRASNNLQHS LRMVLPSRRL ALLERRRILA 

301 HQLGDFIIVY NKETEQMAEK KSKKKVEEEE EDGVNMENFQ EFIRQASEAE 

351 LEEVLTFYTQ KNKSASVFLG THSKISKNNN NYSDSGAKGD HPETIMEEVK 

401 IKPPKQQQTT EIHSDKLSRF TTSAEKEAKL VYSNSSSGPT ATLQKIPNTH 

451 LSSVTTSDLS PGPCHHSSLS QIPSAIPSMP HQPTILLNTV SASASPCLHP 

501 GAQNIPSPTG LPRCRSGSHT IGPFSSFQSA AHIYSQKLSR PSSAKAGSCY 

551 LNKHHSGIAK TQKEGEDASL YSKRYNQSMV TAELQRLAEK QAARQYSPSS 

601 HINLLTQQVT NLNLATGIIN RSSASAPPTL RPIISPSGPT WSTQSDPQAP 
651 ENHSSSPGSR SLQTGGFAWE GEVENNVYSQ ATGWPQHKY HPTAGSYQLQ 

701 FALQQLEQQK LQSRQLLDQS RARHQAI FGS QTLPNSNLWT MNNGAGCRIS 

751 SATASGQKPT TLPQKWPPP SSCASLVPKP PPNHEQVLRR ATSQKASNTR 
801 FRSSFQNYLW YFFQAVS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21n23 / frame 2 

TREMBL : AF0 6485 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds., N — 1, Score « 184 5, P = 2.2e-190 

TREMBL: AF107885_3 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N - 1, Score = 443, P » 5.3e-41 

TREMBL: AF107885_4 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N => 1, Score «« 265, P = 8.2e-22 



>TREMBL : AF0 64 8 5 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds . 

Length - 436 

HSPs: 

Score - 1845 (276.8 bits), Expect ■= 2.2e-190, P = 2.2e-190 
Identities « 369/435 (84%), Positives ° 395/435 (90%) 
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Query: 115 MRPKYPVITQPAEWNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLT 174 

MRPKYPVIT PAEMN+KTETESEEEEEV LDNEDEEQEASQEESAG L ENQAKYTPSLT 
Sbjct: 1 MRPKYPVITLPAEMNIKTETESEEEEEVGLDNEDEEQEASQEESAGSLAENQAKYTPSLT 60 

Query; 175 ALVEKTPKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARIAFSAY 234 

+VEN+P+EN+MKV EW NKG CCK+ETQE E KFNLMQILQDNGNLSK+QAR+AFSAY 
Sbjct: 61 VIVENSPRENAMKVAEWTNKGESCCKIETQEPESKFNLMQILQDNGNLSKVQARLAFSAY 120 

Query: 235 LQHVQIRLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLE 294 

LQHVQ+RL KDSGGQT S SWAAKEDEQMELVVRFLKRAS+NLQHSLRMVLPSRRLALLE 
Sbjct: 121 LQHVQVRLTKDSGGQTLSPSWAAKEDEQMELVVRFLKRASSNLQHSLRMVLPSRRLALLE 180 

Query: 295 RRRILAHQLGDFIIVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEV 354 

RRRILAHQLGDFI+VYNKETEQMAEKKSKKK+EEEEEDGVN E+FQEFIRQASEAELEEV 
Sbjct: 181 RRRILAHQLGDFI WYNKETEQMAEKKSKKKLEEEEEDGVNAESFQEFIRQASEAELEEV 240 

Query: 355 LTFYTQKNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHS 414 

LTFYTQKNKSASVFLGTHSK SKN+++YSDSGAKGDHPETI +EVKIK PKQQQ TEIHS 
Sbjct: 241 LTFYTQKNKSASVFLGTHSKSSKNSSSYSDSGAKGDHPETI-QEVKIKQPKQQQATEIHS 299 

Query: 415 DKLSRFTTSAEKEAKLVYSNSSS— GPTATL-QKIPNTHLSSV-TTSDLSPGPCHHSSLS 470 

DKLSRFTTSA KEAKLVY+N SS GP A L Q-M-P+THLSS+ TTS LS GP HHSSLS 
Sbjct: 300 DKLSRFTTSAGKEAKLVYTNCSSFSGPAAVLLQRLPSTHLSSIITTSTLSSGPGHHSSLS 359 

Query: 471 QIPSAIPSMPHQPTILLNTVSASASPCLHPGAQNIPSPTGLPRCRSGSHTIGPFSSFQSA 530 

QI AIPSMPHQ +LLN V SASP +HPG N+ SP GLPRCRSGS+TIGPFSSFQSA 
Sbjct: 360 QISPAIPSMPHQSALLLNPVPDSASPPVHPGTPNV-SPAGLPRCRSGSYTIGPFSSFQSA 418 

Query: 531 AHIYSQKLSRPSSAKAG 547 

AHIYSQKLSRPSSAKAG 
Sbjct: 419 AHIYSQKLSRPSSAKAG 435 



Pedant information for DKFZphtes3_21n23, frame 2 



Report for DKFZphtes3_21n23 . 2 



[LENGTH] 817 

[MW] 91522.09 

[pi] 9.32 

[HOM0L] TREMBL:AF064856_1 product: w 7acomp protein"; Rattus sp. 7acomp protein mRNA, 

complete cds. le-166 

[PROSITE] MYRISTYL 6 

[PROSITE] CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PHOSPHO_SITE 12 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SlTE 15 

[PROSITE] AS N_GLYC OS Y L AT I ON 7 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 13.83 % 



SEQ MEEIKVLRRVKEENDRRGGFIRIFPTSETWEIYGSYLEHKTSMNYMLATRLFQDRGNPRR 

SEG 

PRD ccchhhhhhhhhhhccccceeeecccccceeeecceeeecccchhhhhhhhhhhcccccc 

SEQ SLLTGRTRMTADGAPELKIESLNSKAKLHAALYERKLLSLEVRKRRRRSSRLRAMRPKYP 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD ccccccceeeccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLTALVENT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ceeeccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhccccceeeeeccc 

SEQ PKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARIAFSAYLQHVQI 

SEG 

PRD cccccceeeeeccccccccchhhhhhhccchhhhhhhcccchhhhhhhhhhhhhhhhhhh 

SEQ RLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLERRRILA 

SEG xxxxxxxxxxxxxxx. 

PRD hhhhcccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhh 

SEQ HQLGDFI I VYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEVLTFYTQ 

SEG xxxxxxxxxxxxx 

PRD hhccceeeeeehhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHSDKLSRF 

SEG 

PRD ccccceeeecccccccccccccccccccccccchhhhhhhccccccceeeeecccccccc 
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SEQ TTSAEKEAKLVYSNSSSGPTATLQKIPNTHLSSVTTSDLSPGPCHHSSLSQI PSAIPSMP 

SEG 

PRD hhhhhhhheeeecccccccceeeecccccccccccccccccccccccccccccccccccc 

SEQ HQPTILLNTVSASASPCLHPGAQNIPSPTGLPRCRSGSHTIGPFSSFQSAAHIYSQKLSR 

SEG 

PRD cccceeeeccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccc 

SEQ PSSAKAGSCYLNKHHSGIAKTQKEGEDASLYSKRYNQSMVTAELQRLAEKQAARQYSFSS 

SEG 

PRD cccccccceeeecccccccccccccccceeeecchhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ HINLLTQQVTNLNLATGIINRSSASAPPTLRPIISPSGPTWSTQSDPQAPENHSSSPGSR 

SEG . . xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccceeeecccccccccccccccccccccccccc 

SEQ S LQTGG FAWEGE VEN N V YS QAT G VV PQHK Y H PT AGS YQLQFALQQL EQQ K LQS RQLL DQS 

SEG xxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RARHQAI FGSQTLPNSNLWTMNNGAGCRI SS ATASGQKPTTLPQK WPPPSSCASLVPKP 

SEG 

PRD hhhhhhhhccccccccceeeeccccceeeeeeeccccccccccceeecccccceeecccc 

SEQ PPNHEQVLRRATSQKASNTRFRSSFQNYLWYFFQAVS 

SEG 

PRD cccchhhhhhhhhhhcccccccccccceeeeeeeccc 



Prosite for DKFZphtes3_21n23 . 2 
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->225 
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ASN~GLYCOSYLATION 
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->385 


ASN GLYCOSYLATION 
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ASN GLYCOSYLATION 
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ASN GLYCOSYLATION 
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PS00001 


620 


->624 


ASN GLYCOSYLATION 
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ASN GLYCOSYLATION 
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CAMP PHOSPHO SITE 
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->H1 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


271- 
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CAMP PHOSPHO SITE 
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CAMP PHOSPHO SITE 
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PS00005 


64->67 


PKC PHOSPHO SITE 
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PKC PHOSPHORS I TE 
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->183 


PKC PHOSPHO SITE 
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PS00005 
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PKC PHOSPHO SITE 
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->290 


PKC PHOSPHO SITE 
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PS00005 


359- 
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629->632 


PKC PHOSPHO SITE 
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PDOC00006 


PS00006 


134- 
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CK2 PHOSPHO SITE 
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CK2 PHOSPHO SITE 
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PS00006 


394- 


•>398 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


422- 


->426 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


455- 


->459 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


561->565 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


643- 


->647 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


563- 


->572 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


195- 


->201 


MYRISTYL 


PDOC00008 


PS00008 


248- 


->254 


MYRISTYL 


PDOC00008 


PS00008 


510->516 


MYRISTYL 


PDOC00008 


PS00008 


557- 


->563 


MYRISTYL 


PDOC00008 


PS00008 


746- 


->752 


MYRISTYL 


PDOC00008 


PS00008 


756->762 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_21n23.2) 
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DKFZphtes3_22c23 
group: testes derived 

DKFZphtes3_22c23 encodes a novel 223 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

unknown 

complete cDNA, complete cds, 3 EST hits (two from a testis library) 
Sequenced by LMU 
Locus: /map= M 9q34 " 
Insert length: 1113 bp 

Poly A stretch at pos. 1073, polyadenylation signal at pos. 1055 

1 GGTGGGCAAA GGCATCTTCC TCTGGGAAGG ACTGGCACAA GCACTTGGTC 
51 CCTGGGTTGT GTGCCTGGGA GGCCGGGATC AGGGCTGGCC CTCTTTCTCC 

101 CTGGCAAAGC AAAACCTCCC TTT TACT ACT ATCAAGGGGA AGTAACTTGA 

151 AGGTGCCTGT GGCAGGCAGC ACCTTGAGCC AACAGGAACC ATTGACATGC 

201 GAGGCCCAGG GCAGGCAGAC TGTGCAGTGG CCATTGGGCG GCCCCTCGGG 

251 GAGGTGGTGA CCCTCCGCGT CCTTGAGAGT TCTCTCAACT GCAGTGCGGG 

301 GGACATGTTG CTGCTTTGGG GCCGGCTCAC CTGGAGGAAG ATGTGCAGGA 

351 AGCTGTTGGA CATGACTTTC AGCTCCAAGA CCAACACGCT GGTGGTGAGG 

401 CAGCGCTGCG GGCGGCCAGG AGGTGGGGTG CTGCTGCGGT ATGGGAGCCA 

451 GCTTGCTCCT GAAACCTTCT ACAGAGAATG TGACATGCAG CTCTTTGGGC 

501 CCTGGGGTGA AATCGTGAGC CCCTCGCTGA GTCCAGCCAC GAGTAATGCA 

551 GGGGGCTGCC GGCTCTTCAT TAATGTGGCT CCGCACGCAC GGATTGCCAT 

601 CCATGCCCTG GCCACCAACA TGGGCGCTGG GACCGAGGGA GCCAATGCCA 

651 GCTACATCTT GATCCGGGAC ACCCACAGCT TGAGGACCAC AGCGTTCCAT 

701 GGGCAGCAGG TGCTCTACTG GGAGTCAGAG AGCAGCCAGG CTGAGATGGA 

751 GTTCAGCGAG GGCTTCCTGA AGGCTCAGGC CAGCCTGCGG GGCCAGTACT 

801 GGACCCTCCA ATCATGGGTA CCGGAGATGC AGGACCCTCA GTCCTGGAAG 

851 GGAAAGGAAG GAACCTGAGG GTCATTGAAC ATTTGTTCCG TGTCTGGCCA 

901 GCCCTGGAGG GTTGACCCCT GGTCTCAGTG CTTTCCAATT CGAACTTTTT 

951 CCAATCTTAG GTATCTACTT TAGAGTCTTC TCCAATGTCC AAAAGGCTAG 
1001 GGGGTTGGAG GTGGGGACTC TGGAAAAGCA GCCCCCATTT CCTCGGGTAC 
1051 CAATAAATAA AACATGCAGG CTGAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1101 AAAAAAAAAA AAA 

BLAST Results 



Entry HSAC1644 from database EMBL: 

Genomic sequence from Human 9q34, complete sequence. 
Score - 2072, P = 8.8e-225, identities = 422/430 
5 exons Bp 41969-38232 



Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 197 bp to 865 bp; peptide length: 223 
Category: putative protein 



1 MRGPGQADCA VAIGRPLGEV VTLRVLESSL NCSAGDMLLL WGRLTWRKMC 
51 RKLLDMTFSS KTNTLVVRQR CGRPGGGVLL RYGSQLAPET FYRECDMQLF 
101 GPWGEIVSPS LSPATSNAGG CRLFINVAPH ARIAIHALAT NMGAGTEGAN 
151 ASYILIRDTH SLRTTAFHGQ QVLYWESESS QAEMEFSEGF LKAQASLRGQ 
201 YWTLQSWVPE MQDPQSWKGK EGT 
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B LAS TP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphte$3_22c23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22c23, frame 2 

Report for DKFZphtes3_22c23.2 



[LENGTH] 

[MW] 

[pi] 

[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



223 

24546.19 
8.57 

MYRISTYL 4 
CK2_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
AS N_GL YCOS YLAT I ON 
Alpha_Beta 



SEQ MRGPGQADCAVAIGRPLGEVVTLRVLESSLNCSAGDMLLLWGRLTWRKMCRKLLDMTFSS 

PRD ccccccccceeeecccccceeeeehhhhhcccccchhhhhhchhhhhhhhhhhhhhhccc 

SEQ KTNTLVVRQRCGRPGGGVLLRYGSQLAPETFYRECDMQLFGPWGEIVSPSLSPATSNAGG 

PRD ccceeeeeecccccccceeeeccccccchhhhhhhhhccccccceeeecccccccccccc 

SEQ CRLFINVAPHARIAIHALATNMGAGTEGANASYILIRDTHSLRTTAFHGQQVLYWESESS 

PRD ceeeeeecccceeehhhhhhhhccccccccceeeeeecccccceeecccceeeeeccccc 

SEQ QAEME FS EG FL KAQAS LRGQYWT LQS W V P EMQD PQS WKGKEG T 

PRD hhhhhhhcchhhhhhhhhhcccccccccccccccccccccccc 



Prosite for DKFZphtes3_22c23 . 2 



PS00001 


31->35 


PS00001 


150->154 


PS00005 


22->25 


PS00005 


4S->48 


PS00005 


59->62 


PS00005 


161->164 


PS00005 


196->199 


PS00005 


216->219 


PS00006 


33->37 


PS00006 


180->184 


PS00008 


5->ll 


PS00008 


145->151 


PS0O0OB 


148->154 


PS00008 


199->205 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOCO0008 



(No Pfam data available for DKFZphtes3_22c23 .2) 
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DKFZphtes3_22g2 



group: nucleic acid management 

DKFZphtes3_22g2encodes a novel 1230 amino acid protein with nearly identical to rat TIP120. 

TATA-binding protein TBP is a central component for transcriptional regulation and is a target 
for various transcription regulators. TBP-interacting protein 120 (TIP120) is a protein 
interacting with the TATA-binding protein (TBP) . The novel protein is the human ortholog of 
rat TIP120. The novel TBP-binding protein is considered to participate in transcription 
regulation through the interaction with TBP. 

The new protein can find application in modulation of gene transcription. 

KIAA0829, complete cds, nearly identical to rat TIP120 
complete cDNA, complete cds, EST hits, 
Sequenced by LMU 

Locus: /map c = ,, 387 .3 cR from top of Chrl2 linkage group" 
Insert length: 5387 bp 

Poly A stretch at pos. 5352, polyadenylation signal at pos. 5335 

1 GGGAGCGAGT GCGGAGCGAG TGGGAGCGAG ACGGCCCTGA GTGGAAGTGT 

51 CTGGCTCCCC GTAGAGGCCC TTCTGTACGC CCCGCCGCCC ATGAGCTCGT 

101 TCTCACGCGA ACAGCGCCGT CGTTAGGCTG GCTCTGTAGC CTCGGCTTAC 

151 CCCGGGACAG GCCCACGCCT CGCCAGGGAG GGGGCAGCCC GTCGAGGCGC 

201 CTCCCTAGTC AGCGTCGGCG TCGCGCTGCG ACCCTGGAAG CGGGAGCCGC 

251 CGCGAGCGAG AGGAGGAGCT CCAGTGGCGG CGGCGGCGGC GGCAGCGGCA 

301 GCGGGCAGCA GCTCCAGCAG CGCCAGCAGG CGGGATCGAG GCCGTCAACA 

351 TGGCGAGCGC CTCGTACCAC ATTTCCAATT TGCTGGAAAA AATGACATCC 

401 AGCGACAAGG ACTTTAGGTT TATGGCTACA AATGATTTGA TGACGGAACT 

451 GCAGAAAGAT TCCATCAAGT TGGATGATGA TAGTGAAAGG AAAGTAGTGA 

501 AAATGATTTT GAAGTTATTG GAAGATAAAA ATGGAGAGGT ACAGAATTTA 

551 GCTGTCAAAT GTCTTGGTCC TTTAGTGAGT AAAGTGAAAG AATACCAAGT 

601 AGAGACAATT GTAGATACCC TCTGCACTAA CATGCTTTCT GATAAAGAAC 

651 AACTTCGAGA CATTTCAAGT ATTGGTCTTA AAACAGTAAT TGGAGAACTT 

701 CCTCCAGCTT CCAGTGGCTC TGCATTAGCT GCTAATGTAT GTAAAAAGAT 

751 TACTGGACGT CTTACAAGTG CAATAGCAAA ACAGGAAGAT GTCTCTGTTC 

801 AGCTAGAAGC CTTGGATATT ATGGCTGATA TGTTGAGCAG GCAAGGAGGA 

851 CTTCTTGTTA ATTTCCATCC TTCAATTCTG ACCTGTCTAC TTCCCCAGTT 

901 GACCAGCCCT AGACTTGCAG TGAGGAAAAG AACCATTATC GCTCTTGGCC 

951 ATCTGGTTAT GAGCTGTGGA AATATAGTTT TTGTAGATCT TATTGAACAT 

1001 CTGTTGTCAG AGTTGTCCAA AAATGATTCT ATGTCAACAA CAAGAACCTA 

1051 CATACAATGT ATTGCTGCTA TTAGTAGGCA AGCTGGTCAT AGAATAGGTG 

1101 AATACCTTGA GAAGATAATT CCTTTGGTGG TAAAATTTTG CAATGTAGAT 

1151 GATGATGAAT TAAGAGAGTA CTGTATTCAA GCCTTTGAAT CATTTGTAAG 

1201 AAGATGTCCT AAGGAAGTAT ATCCTCATGT TTCTACCATT ATAAATATTT 

1251 GTCTTAAATA TCTTACCTAT GATCCAAATT ATAATTACGA TGATGAAGAT 

1301 GAAGATGAAA ATGCAATGGA TGCTGATGGT GGTGATGATG ATGATCAAGG 

1351 GAGTGATGAT GAATACAGTG ATGATGATGA CATGAGTTGG AAAGTGAGAC 

1401 GTGCAGCTGC GAAGTGCTTG GATGCTGTAG TTAGCACAAG GCATGAAATG 

1451 CTTCCAGAAT TCTACAAGAC CGTCTCTCCT GCACTAATAT CCAGATTTAA 

1501 AGAGCGTGAA GAGAATGTAA AGGCAGATGT TTTTCACGCA TACCTTTCTC 

1551 TTTTGAAGCA AACTCGTCCT GTACAAAGTT GGCTATGTGA CCCTGATGCA 

1601 ATGGAGCAGG GAGAAACACC TTTAACAATG CTTCAGAGTC AGGTTCCCAA 

1651 CATTGTTAAA GCTCTTCACA AACAGATGAA AGAAAAAAGT GTGAAGACCC 

1701 GACAGTGTTG TTTTAACATG TTAACTGAGC TGGTAAATGT ATTACCTGGG 

1751 GCCCTAACTC AACACATTCC TGTACTTGTA CCAGGAATCA TTTTCTCACT 

1801 GAATGATAAA TCAAGCTCAT CGAATTTGAA GATCGATGCT TTGTCATGTC 

1851 TATACGTAAT CCTCTGTAAC CATTCTCCTC AAGTCTTCCA TCCTCACGTT 

1901 CAGGCTTTGG TTCCTCCAGT GGTGGCTTGT GTTGGAGACC CATTTTACAA 

1951 AATTACATCT GAAGCACTTC TTGTTACTCA ACAGCTTGTC AAAGTAATTC 

2001 GTCCTTTAGA TCAGCCTTCC TCGTTTGATG CAACTCCTTA TATCAAAGAT 

2051 CTATTTACCT GTACCATTAA GAGATTAAAA GCAGCTGACA TTGATCAGGA 

2101 AGTCAAGGAA AGGGCTATTT CCTGTATGGG ACAAATTATT TGCAACCTTG 

2151 GAGACAATTT GGGTTCTGAC TTGCCTAATA CACTTCAGAT TTTCTTGGAG 

2201 AGACTAAAGA ATGAAATTAC CAGGTTAACT ACAGTAAAGG CATTGACACT 

2251 GATTGCTGGG TCACCTTTGA AGATAGATTT GAGGCCTGTT CTGGGAGAAG 

2301 GGGTTCCTAT CCTTGCTTCA TTTCTTAGAA AAAACCAGAG AGCTTTGAAA 

2351 CTGGGTACTC TTTCTGCCCT TGATATTCTA ATAAAAAACT ATAGTGACAG 

2401 CTTGACAGCT GCCATGATTG ATGCAGTTCT AGATGAGCTC CCACCTCTTA 

2451 TCAGCGAAAG TGATATGCAT GTTTCACAAA TGGCCATCAG TTTTCTTACC 

2501 ACTTTGGCAA AAGTATATCC CTCCTCCCTT TCAAAGATAA GTGGATCCAT 

2551 TCTCAATGAA CTTATTGGAC TTGTGAGATC ACCCTTATTG CAGGGGGGAG 
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2601 CTCTTAGTGC CATGCTAGAC TTTTTCCAAG CTCTGGTTGT CACTGGAACA 
2651 AATAATTTAG GATACATGGA TTTGTTGCGC ATGCTGACTG GTCCAGTTTA 
2701 CTCTCAGAGC ACAGCTCTTA CTCATAAGCA GTCTTATTAT TCCATTGCCA 
2751 AATGTGTAGC TGCCCTTACT CGAGCATGCC CTAAAGAGGG ACCAGCTGTA 
2801 GTAGGTCAGT TTATTCAAGA TGTCAAGAAC TCAAGGTCTA CAGATTCCAT 
2851 TCGTCTCTTA GCTCTACTTT CTCTTGGAGA AGTTGGGCAT CATATTGACT 
2901 TAAGTGGACA GTTGGAACTA AAATCTGTAA TACTAGAAGC TTTCTCATCT 
2951 CCTAGTGAAG AAGTCAAATC AGCTGCATCC TATGCATTAG GCAGCATTAG 
3001 TGTGGGCAAC CTTCCTGAAT ATCTGCCGTT TGTCCTGCAA GAAATAACTA 
3051 GTCAACCCAA AAGGCAGTAT CTTTTACTTC ATTCCTTGAA GGAAATTATT 
3101 AGCTCTGCAT CAGTGGTGGG CCTTAAACCA TATGTTGAAA ACATCTGGGC 
3151 CTTATTACTA AAGCACTGTG AGTGTGCAGA GGAAGGAACC AGAAATGTTG 
3201 TTGCTGAATG TCTAGGAAAA CTCACTCTAA TTGATCCAGA AACTCTCCTT 
3251 CCACGGCTTA AGGGGTACTT GATATCAGGC TCATCATATG CCCGAAGCTC 
3301 AGTGGTTACG GCTGTGAAAT TTACAATTTC TGACCATCCA CAACCTATTG 
3351 ATCCACTGTT AAAGAACTGC ATAGGTGATT TCCTAAAAAC TTTGGAAGAC 
3401 CCAGATTTGA ATGTGAGAAG AGTAGCCTTG GTCACATTTA ATTCAGCAGC 
3451 ACATAACAAG CCATCATTAA TAAGGGATCT ATTGGATACT GTTCTTCCAC 
3501 ATCTTTACAA TGAAACAAAA GTTAGAAAGG AGCTTATAAG AGAGGTAGAA 
3551 ATGGGTCCAT TTAAACATAC GGTTGATGAT GGTCTGGATA TTAGAAAGGC 
3601 AGCATTTGAG TGTATGTACA CACTTCTAGA CAGTTGTCTT GATAGACTTG 
3651 ATATCTTTGA ATTTCTAAAT CATGTTGAAG ATGGTTTGAA GGACCATTAT 
3701 GATATTAAGA TGCTGACATT TTTAATGTTG GTGAGACTGT CTACCCTTTG 
3751 TCCAAGTGCA GTACTGCAGA GGTTGGACCG ACTTGTTGAG CCATTACGTG 
3801 CAACATGTAC AACTAAGGTA AAGGCAAACT CAGTAAAGCA GGAGTTTGAA 
3851 AAACAAGATG AATTAAAGCG ATCTGCCATG AGAGCAGTAG CAGCACTGCT 
3901 AACCATTCCA GAAGCAGAGA AGAGTCCACT GATGAGTGAA TTCCAGTCAC 
3951 AGATCAGTTC TAACCCTGAG CTGGCGGCTA TCTTTGAAAG TATCCAGAAA 
4001 GATTCATCAT CTACTAACTT GGAATCAATG GACACTAGTT AGATGTTTGT 
4051 TCACCATGGG G AC C ATT AC A TATGACCATA CAATGCACTG AATTGACAGG 
4101 TTAATCATAA GACATGGAAA GAGAAGTGTC TAAAAGCTTC AAAATGTTCC 
4151 ACTTTTTTTT CCTTCATGGA GACTGTTTGT TTGGCTTTCT TCCATTGTTG 
4201 TTTTTGTAGC ATTTATTTCA GAAATGTGTA TTTCCATAAT CCAGAGGTTG 
4251 TAAAACCACT AGTGTTTTAG TGGTTACAGC AACATTTGAA ATGGAAACTA 
4301 AAAGTTAGGA TTTTATGGAG TATGGAGATA GGGTCCAGTA TCTATTTACC 
4351 CTGTAATGTT TAGGATTAAA ATGTTAAAAT TTTGTGACCA TGAATTTCTT 
4401 TCTTTTATAA ATTTTCTCAT TTAAAAATCA AAAATCTTGC AAAACAAAAA 
4451 CCATGTTTCT TTTTCTTGTA TAACTTTTTG TTTTCAGCAA CATAAATTGA 
4501 TTTTTAGCTG GCAGACAAGA ATATCCATAT AAGATTTGTT AACCATTTCA 
4551 GAGAGTTTGG CAATTTTTAA AAGATAATAA GGTATCATTT TTAAGTATGA 
4601 AAATTAACAA TATCCCTGTT GCGCACACTA ATTTTGCATG AGTAAGTTTA 
4651 CAAATATGTA TCGTCTGTAA AGCAGCATGT GC AG ATT ATT CATAATATAG 
4701 AAGTTAAAAT AAGTATTAGT GCAATTTTCA GATATTTATT TTTGCACAGA 
47 51 AAACACATTA TCTGGAGAGA AAGAAAGGAG AATTTTTGAG ACTTGGGTTT 
4801 TCTTAATGCC AGTGTGAATT TGCAGATGTT TTCAGAAAAT CAAGTCACAG 
4851 TAACAATTTG CCACTTTTTT CTATTATAAA TCTTCTTACT TAAATTTTGA 
4901 ATATTTAGTT TTTCTCAGTT ACCCATTTGT GTGTGTGTGA TTCCACTTAG 
4951 AAATTCTTAA AACCAGATTT TTCTTTCATT CCGTTTGGAT GTCTACATTC 
5001 CTTATCAAAG GATATAAATA CTGTGTATGC TTTTGAATTT TATTTTTAGG 
5051 AAAATTCTGA AGCCAGCTAT CACAGGTTTG TTAGCTAATA ATAGTATTTT 
5101 CTTTTAGTTG AGTTAGGTTT TTCCCCATCT CCTGTAGAGC GAATTTACAT 
5151 ATTGTATTGG GTAAGTGTTC ACTACTTTTC CTGATTAAGG GATCTGTGCT 
5201 GGGGAACAAA GCTTTTGCAG TACCTTATAT TGTAGTTAAA ATTTTATTTA 
5251 ACATATCCTT CAGTGAGCTC ATTTCACACT GTAGCCTCTT CCTTAAAATT 
5301 TGTGGTGCTC CTGTAACAGT AAGAACTAAT TCTGAAATAA AAGACATCTC 
5351 CTAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS793345 from database EMBL: 
human STS Wl-12457. 
Score «= 1985, P « 1.3e-83, identities = 433/460 



Medline entries 



97127450: 

Molecular cloning of a novel 120-kDa TBP-interacting 
protein. 



Peptide information for frame 2 
ORF from 350 bp to 4039 bp; peptide length: 1230 
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Category: known protein 
Classification: Nucleic acid management 



1 MASASYHISN LLEKMTSSDK DFRFMATNDL MTELQKDSIK LDDDSERKW 
51 KMILKLLEDK NGEVQNLAVK CLGPLVSKVK EYQVETIVDT LCTNMLSDKE 
101 QLRDISSIGL KTVIGELPPA SSGSALAANV CKKITGRLTS AIAKQEDVSV 
151 QLEALDIMAD MLSRQGGLLV NFHPSILTCL LPQLTSPRLA VRKRTIIALG 
201 HLVMSCGNIV FVDLIEHLLS ELSKNDSMST TRTYIQCIAA ISRQAGHRIG 
251 EYLEKIIPLV VKFCNVDDDE LREYCIQAFE SFVRRCPKEV YPHVSTIINI 
301 CLKYLTYDPN YNYDDEDEDE NAMDADGGDD DDQGSDDEYS DDDDMSWKVR 
351 RAAAKCLDAV VSTRHEMLPE FYKTVSPALI SRFKEREENV KADVFHAYLS 
401 LLKQTRPVQS WLCDPDAMEQ GETPLTMLQS QVPNIVKALH KQMKEKSVKT 
451 RQCCFNMLTE LVNVLPGALT QHIPVLVPGI IFSLNDKSSS SNLKIDALSC 
501 LYVILCNHSP QVFHPHVQAL VPPVVACVGD PFYKITSEAL LVTQQLVKVI 
551 RPLDQPSSFD ATPYIKDLFT CTIKRLKAAD IDQEVKERAI SCMGQIICNL 
601 GDNLGSDLPN TLQI FLERLK NEITRLTTVK ALTLIAGSPL KIDLRPVLGE 
651 GVPILASFLR KNQRALKLGT LSALDILIKN YSDSLTAAMI DAVLDELPPL 
701 ISESDMHVSQ MAISFLTTLA KVYPSSLSKI SGSILNELIG LVRSPLLQGG 
751 ALSAMLDFFQ ALVVTGTNNL GYMDLLRMLT GPVYSQSTAL THKQSYYSIA 
801 KCVAALTRAC PKEGPAVVGQ FIQDVKNSRS TDSIRLLALL SLGEVGHHID 
851 LSGQLELKSV ILEAFSSPSE EVKSAASYAL GSISVGNLPE YLPFVLQEIT 
901 SQPKRQYLLL HSLKEIISSA SWGLKPYVE NIWALLLKHC ECAEEGTRNV 
951 VAECLGKLTL IDPETLLPRL KGYLISGSSY ARSSVVTAVK FTISDHPQPI 
1001 DPLLKNCIGD FLKTLEDPDL NVRRVALVTF NSAAHNKPSL IRDLLDTVLP 
1051 HLYNETKVRK ELIREVEMGP FKHTVDDGLD IRKAAFECMY TLLDSCLDRL 
1101 DI FEFLNHVE DGLKDHYDIK MLTFLMLVRL STLCPSAVLQ RLDRLVEPLR 
1151 ATCTTKVKAN SVKQEFEKQD ELKRSAMRAV AALLTIPEAE KSPLMSEFQS 
1201 QISSNPELAA IFESIQKDSS STNLESMDTS 

BLAST P hits 

No BLASTP hits available 

Alert BLAST P hits for DKFZphtes3_22g2, frame 2 

TREMBL:AB020636_1 gene: **KIAA0829 rt ; product: "KIAA0829 protein"; Homo 
sapiens mRNA for KIAA0829 protein, partial cds . , N = 1, Score « 5986, P 
= 0 

TREMBL: RND6711_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus 
mRNA for TIP120, complete cds., N = 1, Score « 6203, P ~ 0 



>TREMBL : RND671 1_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA 
for TIP120, complete cds. 
Length = 1,230 

HSPs: 

Score = 6203 (930.7 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 1227/1230 (99%), Positives « 1228/1230 (99%) 

Query: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 60 

MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 
Sbjct: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 60 

Query: 61 NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMLSDKEQLRDI SS IGLKTVIGELPPA 120 

NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 
Sbjct: 61 NGEVQNLAVKCLGPLVSKVKEYQVET I VDTLCTNMLSDKEQLRD I SS IGLKTVIGELPPA 120 

Query: 121 SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 180 

SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 
Sbjct: 121 SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 180 

Query: 181 LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSKSTTRTYIQCIAA 240 

LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 
Sbjct: 181 LPQLTSPRLAVRKRT I I ALGHLVMSCGNIVFVDLIEHLLSELSKNDSKSTTRTY I QC I AA 240 

Query: 241 ISRQAGHRIGEYLEKIIPLWKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 300 

ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 
Sbjct: 241 ISRQAGHRIGEYLEKIIPLWKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 300 

Query: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 360 

CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 
Sbjct: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 360 

Query: 361 VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 420 

VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 
Sbjct: 361 VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 420 
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Query: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 
Sbjct: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

Query: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 
Sbjct: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

Query: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 600 

LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 
Sbjct: 541 LVTQQLVKVI RPLDQPSS FDATP Y I KDL FTCTI KRLKAADI DQEVKERAI SCMGQI I CNL 600 

Query: 601 GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

GDNLG DL NTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 
Sbjct: 601 GDNLGPDLSNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

Query: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 720 

KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 
Sbjct: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 720 

Query; 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 
Sbjct: 721 KV Y PS SLSK I SGS I LNELIGLVRS PLLQGGALS AMLDFFQALVVTGTNNLG YMDLLRMLT 780 

Query: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAWGQFIQDVKNSRSTDSIRLLALL 
Sbjct: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAWGQFIQDVKNSRSTDSIRLLALL 840 

Query: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
Sbjct: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

Query: 901 SQPKRQYLLLHSLKEI I SSASVVGLKPYVENI WALLLKHCECAEEGTRNVVAECLGKLTL 960 

SQPKRQYLLLHSLKEI I SSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 
Sbjct: 901 SQPKRQYLLLHSLKEI I SSASVVGLKPYVENI WALLLKHCECAEEGTRNVVAECLGKLTL 960 

Query: 961 IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 
Sbjct: 961 IDPETLLPRLKGYLISGSSYARSSWTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

Query: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

NVRRVALVTFNSAAHNKPSLIRDLLD+VLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 
Sbjct: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDSVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

Query: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 
Sbjct: 1081 IRKAAFECNYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

Query: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 
Sbjct: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

Query: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 

QISSNPELAAIFESIQKDSSSTNLESMDTS 
Sbjct: 1201 QISSNPELAAIFESIQKDSSSTNLESMDTS 1230 



Pedant information for DKF2phtes3_22g2, frame 2 



Report for DKFZphtes3_22g2 . 2 



[HOMOLJ 
TIP120, 



[LENGTH] 



[MW] 
Ipl] 



1230 

136376.58 
5.52 

TREMBL : RND6711_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA for 



complete cds. 0.0 



[KWJ 
[KW] 



TRANSMEMBRANE 1 
LOW COMPLEXITY 5.28 % 



SEQ 
SEG 
PRD 
MEM 



MASASYHI SNLLEKMTS SDKDFRFMATNDLMTELQKDS I KLDDDSERK WKMI LKLLEDK 



cccccchhhhhhhhhcccccceeeeehhhhhhhhhcccccccccchhhhhhhhhhhhhcc 



SEQ 
SEG 
PRD 



NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 

xxxx 

ccccceeeeeeeeceeeeehhhhhhhhhhhhccchhhhhcccccccchhhhhhhhhcccc 
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MEM 

SEQ SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 

SEG xxxxxxxx 

PRD cccccchhhhhhhccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeeecchhhhhh 

MEM 

SEQ LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 

SEG 

PRD hcccccchhhhhhhhhhhheeeeecccceeehhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 

SEG 

PRD hhhhcccccccchhhhhhhhheeeeccchhhhhhhhhhhhhhhhccccceeecchhhhhh 

MEM 

SEQ CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 

SEG 

PRD hhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeecccccccc 

MEM 

SEQ GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhccccccccceeeecce 

MEM 

SEQ IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 

SEG xxxxxxxxxxxxxxxx 

PRD eeeeccccccccchhhhhhhheeeeecccccccccceeeeecceeeeecccchhhhhhhh 

MEM 

SEQ LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 

SEG 

PRD hhhhhhhhhhcccccccccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhheeeecc 

MEM 

SEQ GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 

SEG 

PRD cccccccccchhhhhhhhhcchhhhhhhhhhhheeeeccccccccceeehhhhhlihhhhh 

MEM 

SEQ KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 

SEG 

PRD cccccceeecchhhhhhhhhhhccccccchhhhhhhhhhhheeeecccccchhhhhhhhc 

MEM 

SEQ GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhh 

MEM 

SEQ SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 

SEG 

PRD hccccccccccccccccceeeeeeccccchhhhhhhhhhhccccccccccchhhhhhhhh 

MEM 

SEQ SQPKRQYLLLHSLKEIISSASWGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 

SEG : 

PRD cccchhhhhhhhhhhhhhcccceeehhhhhhhhhhhhhhhhcccccceeeeecccccccc 

MEM 

SEQ IDPETLLPRLKGYLISGSSYARSSWTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhccccccccccchhhhhhhhhhhccccc 

MEM 

SEQ NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 

SEG 

PRD ccceeeeeeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccch 

MEM 
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SEQ IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 

SEG 

PRD hhhhhhhhhhhhhhhccccccceeeecccccccccchhhhhhhhhhhhhhhhcccchhhh 

MEM 

SEQ RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 

SEG 

PRD hhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhh 

MEM 

SEQ QI SSNPELAAI FESIQKDSSSTNLESMDTS 

SEG 

PRD hhhccchhhhhhhhhhhccccccccccccc 

MEM 

(No Prosite data available for DKFZphtes3_22g2 . 2 ) 
(No Pfam data available for DKFZphtes3_22g2 . 2) 
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DKFZphtes3_22nl3 
group: testes derived 

DKFZphtes3_22nl3 encodes a novel 677 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

dJl042Kl0.3, complete 
Sequenced by LMU 
Locus: /map="22ql3.1-13.2" 
Insert length: 3353 bp 

Poly A stretch at pos. 3315, polyadenylation signal at pos. 3298 

1 ATGGAACCAC TATCCCCACT GCCAAGTCCA CCCCCACACT CATTAAGCAA 
51 AGCCAACCCA AGTCTGCCAG TGAGAAGTCA CAGCGCAGCA AGAAGGCCAA 

101 GGAGCTGAAG CCAAAGGTGA AGAAGCTCAA GTACCACCAG TACATCCCCC 

151 CGGACCAGAA GCAGGACAGG GGGGCACCCC CCATGGACTC ATCCTACGCC 

201 AAGATCCTGC AGCAGCAGCA GCTCTTCCTC CAGCTGCAGA TCCTCAACCA 

251 GCAGCAGCAG CAGCACCACA ACTACCAGGC CATCCTGCCT GCCCCGCCAA 

301 AGTCAGCAGG CGAGGCCCTG GGAAGCAGCG GGACCCCCCC AGTACGCAGC 

351 CTCTCCACTA CCAATAGCAG CTCCAGCTCG GGCGCCCCTG GGCCCTGTGG 

401 GCTGGCACGT CAGAACAGCA CCTCACTGAC TGGCAAGCCG GGAGCCCTGC 

451 CGGCCAACCT GGACGACATG AAGGTGGCAG AGCTGAAGCA GGAGCTGAAG 

501 TTGCGATCAC TGCCTGTCTC GGGCACCAAA ACTGAGCTGA TTGAGCGCCT 

551 TCGAGCCTAT CAAGACCAAA TCAGCCCTGT GCCAGGAGCC CCCAAGGCCC 

601 CTGCCGCCAC CTCTATCCTG CACAAGGCTG GCGAGGTGGT GGTAGCCTTC 

651 CCAGCGGCCC GGCTGAGCAC GGGGCCAGCC CTGGTGGCAG CAGGCCTGGC 

701 TCCAGCTGAG GTGGTGGTGG CCACGGTGGC CAGCAGTGGG GTGGTGAAGT 

751 TTGGCAGCAC GGGCTCCACG CCCCCCGTGT CTCCCACCCC CTCGGAGCGC 

801 TCACTGCTCA GCACGGGCGA TGAAAACTCC ACCCCCGGGG ACACCTTTGG 

851 TGAGATGGTG ACATCACCTC TGACGCAGCT GACCCTGCAG GCCTCGCCAC 

901 TGCAGATCCT CGTGAAGGAG GAGGGCCCCC GGGCCGGGTC CTGTTGCCTG 

951 AGCCCTGGGG GGCGGGCGGA GCTAGAGGGG CGCGACAAGG ACCAGATGCT 
1001 GCAGGAGAAA GACAAGCAGA TCGAGGCGCT GACGCGCATG CTCCGGCAGA 
1051 AGCAGCAGCT GGTGGAGCGG CTCAAGCTGC AGCTGGAGCA GGAGAAGCGA 
1101 GCCCAGCAGC CCGCCCCCGC CCCCGCCCCC CTCGGCACCC CCGTGAAGCA 
1151 GGAGAACAGC TTCTCCAGCT GCCAGCTGAG CCAGCAGCCC CTGGGCCCCG 
1201 CTCACCCATT CAACCCCAGC CTGGCGGCCC CAGCCACCAA CCACATAGAC 
1251 CCTTGTGCTG TGGCCCCAGG GCCCCCGTCC GTGGTGGTGA AGCAGGAAGC 
1301 CTTGCAGCCT GAGCCCGAGC CGGTCCCCGC CCCCCAGTTG CTTCTGGGGC 
1351 CTCAGGGCCC CGGCCTCATC AAGGGGGTTG CACCTCCCAC CCTCATCACC 
1401 GACTCCACAG GGACCCACCT TGTCCTCACC GTGACCAATA AGAATGCAGA 
1451 CAGCCCTGGC CTGTCCAGTG GGAGCCCCCA GCAGCCCTCG TCCCAGCCTG 
1501 GCTCTCCAGC GCCTGCCCCC TCTGCCCAGA TGGACCTGGA GCACCCACTG 
1551 CAGCCCCTCT TTGGGACCCC CACTTCTCTG CTGAAGAAGG AACCACCTGG 
1601 CTATGAGGAA GCCATGAGCC AGCAGCCCAA ACAGCAGGAA AATGGTTCCT 
1651 CAAGCCAGCA GATGGACGAC CTGTTTGACA TTCTCATTCA GAGCGGAGAA 
1701 ATTTCAGCAG ATTTCAAGGA GCCGCCATCC CTGCCAGGGA AGGAGAAGCC 
1751 ATCCCCGAAG ACAGTCTGTG GGTCCCCCCT GGCAGCACAG CCATCACCTT 
1801 CTGCTGAGCT CCCCCAGGCT GCCCCACCTC CTCCAGGCTC ACCCTCCCTC 
1851 CCTGGACGCC TGGAGGACTT CCTGGAGAGC AGCACGGGGC TGCCCCTGCT 
1901 GACCAGTGGG CATGACGGGC CAGAGCCCCT TTCCCTCATT GACGACCTCC 
1951 ATAGCCAGAT GCTGAGCAGC ACTGCCATCC TGGACCACCC CCCGTCACCC 
2001 ATGGACACCT CGGAATTGCA CTTTGTTCCT GAGCCCAGCA GCACCATGGG 
2051 CCTGGACCTG GCTGATGGCC ACCTGGACAG CATGGACTGG CTGGAGCTGT 
2101 CGTCAGGTGG TCCCGTGCTG AGCCTAGCCC CCCTCAGCAC CACAGCCCCC 
2151 AGCCTCTTCT CCACAGACTT CCTCGATGGC CATGATTTGC AGCTGCACTG 
2201 GGATTCCTGC TTGTAGCTCT CTGGCTCAAG ACGGGGTGGG GAAGGGGCTG 
2251 GGAGCCAGGG TACTCCAATG CGTGGCTCTC CTGCGTGATT CGGCCTCTCC 
2301 ACATGGTTGT GAGTCTTGAC AATCACAGCC CCTGCTTTTT CCCTTCCCTG 
2351 GGAGGCTAGA ACAGAGAAGC CCTTACTCCT GGTTCAGTGC CACGCAGGGC 
2401 AGAGGAGAGC AGCTGTCAAG AAGCAGCCCT GGCTCTCACG CTGGGGTTTT 
2451 GGACACACGG TCAGGGTCAG GGCCATTTCA GCTTGACCTC CTTTTTTGAG 
2501 GTCAGGGGGC ACTGTCTGTC TGGCTACAAT TTGGCTAAGG TAGGTGAAGC 
2551 CTGGCCAGGC GGGAGGCTTC TCTTCTGACC CAGGGCTGAG ACAGGTTAAG 
2601 GGGTGAATCT CCTTCCTTTC TCTCCCTGCT TTGCTGTGAA GGGAGAAATT 
2651 AGCCTGGGCC TCTACCCCCT ATTCCCTGTG TCTGCCAACC CCAGGATCCC 
2701 AGGGCTCCCT GCCATTTTAG TGTCTTGGTG TAGTGTAACC ATTTAGTGGT 
2751 TGGTGGCAAC AATTTTATGT ACAGGTGTAT ATACCTCTAT ATT AT AT AT C 
2801 GACATACATA TATATTTTTG GGGGGGGGCG GACAGGAGAT GGGTGCAACT 
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2851 CCCTCCCATC CTACTCTCAC AGAAGGGCCT GGATGCAAGG TTACCCTTGA 
2901 GCTGTGTGCC ACAGTCTGGT GCCCAGTCTG GCATGCAGCT ACCCAGGCCC 
2951 ACCCATCACG TGTGATTGAC ATGTAGGTAC CCTGCCACGG CCTATGCCCC 
3001 ACCTGCCCTG CTTCCTGGCT CCTTATCAGT GCCATGAGGG CAGAGGTGCT 
3051 ACCTGGCCTT CCTGCCAGGA GCTCTCCACC CACTCACATT CCGTCCCCGC 
3101 CGCCTCACTG CAGCCAGCGT GGCCCTAGGA CAGGAGGAGC TTCGGGCCCA 
3151 GCTTCACCCT GCGGTGGGGC TGAGGGGTGG CCATCTCCTG CCCTGGGGCC 
3201 ACTGGCTTCA CATTCTGGGC TGACTCATAG GGGAGTAGGG GTGGAGTCAC 
3251 CAAAACCAGT GCTGGGACAA AGATGGGGAA GGTGTGTGAA CTTTTTAAAA 
3301 TAAACACAAA AACACAGGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAG 



BLAST Results 



Entry HS1042K10 from database EMBL: 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . 
Contains the ADSL gene for Adenylosuccinate lyase (EC 4.3.2.2, 
Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3} . Contains ESTs, STSs, GSSs and a 
putative CpG island. 

Score - 7997, P - 0.0e+00, identities = 1617/1645 
7 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 183 bp to 2213 bp; peptide length: 677 
Category: similarity to unknown protein 
Classification: unclassified 



1 MDSSYAKILQ QQQLFLQLQI LNQQQQQHHN YQAILPAPPK SAGEALGSSG 
51 TPPVRSLSTT NSSSSSGAPG PCGLARQNST SLTGKPGALP ANLDDMKVAE 
101 LKQELKLRSL PVSGTKTELI ERLRAYQDQI SPVPGAPKAP AATSILHKAG 
151 EWVAFPAAR LSTGPALVAA GLAPAEWVA TVASSGWKF GSTGSTPPVS 
201 PTPSERSLLS TGDENSTPGD TFGEMVTSPL TQLTLQASPL QILVKEEGPR 
251 AGSCCLSPGG RAELEGRDKD QMLQEKDKQI EALTRMLRQK QQLVERLKLQ 
301 LEQEKRAQQP APAPAPLGTP VKQENSFSSC QLSQQPLGPA HPFNPSLAAP 
351 ATNHIDPCAV APGPPSVVVK QEALQPEPEP VPAPQLLLGP QGPGLIKGVA 
401 PPTLITDSTG THLVLTVTNK NADSPGLSSG SPQQPSSQPG SPAPAPSAQM 
451 DLEHPLQPLF GTPTSLLKKE PPGYEEAMSQ QPKQQENGSS SQQMDDLFDI 
501 LIQSGEISAD FKEPPSLPGK EKPSPKTVCG SPLAAQPSPS AELPQAAPPP 
551 PGSPSLPGRL EDFLESSTGL PLLTSGHDGP EPLSLIDDLH SQMLSSTAIL 
601 DHPPSPMDTS ELHFVPEPSS TMGLDLADGH LDSMDWLELS SGGPVLSLAP 
651 LSTTAPSLFS TDFLDGHDLQ LHWDSCL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22nl3, frame 3 

TREMBL:HS1042K10_6 gene: "dJ1042K10 . 3" ; product: M dJ1042K10.3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3. 1-13.2. Contains the ADSL gene for Adenylosuccinate lyase (EC 
4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable 
rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs 
and a putative CpG island., N - 1, Score = 1285, P •= 4.9e-131 

TREMBL : CEUK06A9 3 gene: "K06A9. la"; Caenorhabditis elegans cosmid 
K06A9., N - 2, Score = 149, P - 1.3e-09 

TREMBLNEW:SSI132828_1 product: "p210 protein"; Spermatozopsis similis 

mRNA for p210 protein, partial, N = 1, Score = 171, p = 2.8e-09 



>TREMBL:HS1042K10_6 gene: "dJ1042KlO . 3" ; product: "dJl042K10 . 3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3. 1-13.2. Contains the ADSL gene for Adenylosuccinate lyase (EC 
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4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a 
putative CpG island. 
Length - 243 

HSPs : 

Score » 1285 (192.8 bits), Expect - 4.9e-131, P « 4.9e-131 
Identities = 243/243 (100%), Positives = 243/243 (100%) 

Query: 435 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 494 

PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 
Sbjct: 1 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 60 

Query: 495 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 554 

DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 
SbjCt: 61 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 120 

Query: 555 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 614 

SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 
Sbjct: 121 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 180 

Query: 615 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 674 

VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 
SbjCt: 181 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 240 

Query: 675 SCL 677 
SCL 

Sbjct: 241 SCL 243 



Pedant information for DKFZphtes3_22nl3, frame 3 



Report for DKFZphtes3_22nl3 . 3 



[LENGTH] 677 

[MW] 70743.01 

[pi] 4.93 

[HOMOL] TREMBL:HS1042K10_6 gene: "dJl042Kl0. 3"; product: "dJl042K10.3 (novel protein)"; 



Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . Contains the ADSL gene for 
Adenylosuccinate lyase (EC 4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with 
probable rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a putative 
CpG island, le-111 



[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 21.57 % 

[KW] COILED_COIL 4.58 % 

SEQ MDSSYAKILQQQQLFLQLQILNQQQQQHHNYQAILPAPPKSAGEALGSSGTPPVRSLSTT 

SEG xxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhcceeeeeecccccceeeecccccccceeecccc 

COILS 

MEM 

SEQ NSSSSSGAPGPCGLARQNSTSLTGKPGALPANLDDMKVAELKQELKLRSLPVSGTKTELI 

SEG xxxxxx 

PRD cccccccccccceeecccccccccccccccccccchhhhhhhhhhhhhhcccccchhhhh 

COILS 

MEM 

SEQ E RLRA YQDQ I S P V PG A P KA P AATS I L H KAGE VVV A F P AARL S T G P A L V AAGL A PA EV VV A 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcccccccccccceeeeeeeccceeeeccccccccccccccccccceeeeee 

COILS 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ TVASSGVVKFGSTGSTPPVSPTPSERSLLSTGDENSTPGDTFGEMVTSPLTQLTLQASPL 

SEG xxxxxxxx . . xxxxxxxxxxxxxx 

PRD eeecccccccccccccccccccccceeeeccccccccccccccceeecccceeeecccce 

COILS 

MEM M 

SEQ QILVKEEGPRAGSCCLSPGGRAELEGRDKDQMLQEKDKQIEALTRMLRQKQQLVERLKLQ 

SEG 

PRD eeeeeccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ L EQEKRAQQ P AP AP AP LGT PVKQEN S FS S CQL S QQP LG PAH P FN P S LAA P AT N H I D P C A V 



731 



WO 01/12659 



PCT/IBOO/01496 



SEG xxxxxxxxxx 

PRD hhhhhhhhhcccccccccccccccccceeeeecccccccccccccceeeccccccccccc 

COILS CCCCCCC 

MEM 

SEQ APGPPSVVVKQEALQPEPEPVPAPQLLLGPQGPGLIKGVAPPTLITDSTGTHLVLTVTNK 

SEG xxxxxxxxxxxx 

PRD cccccceeeeeccccccccccccceeeccccccceeeeecccccccccccceeeeeeecc 

COILS 

MEM 

SEQ NADSPGLSSGSPQQPSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQ 

SEG . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccccccccccc 

COILS 

MEM 

SEQ QPKQQENGSSSQQMDDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPS 

SEG xxxxxxxxxxx 

PRD ccccccccccccchhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ AELPQAAPPPPGSPSLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAIL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhcccceee 

COILS 

MEM 

SEQ DHPPSPMDTSELHFVPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFS 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeeeccccceeeeeecccccccccc 

COILS 

MEM 

SEQ TDFLDGHDLQLHWDSCL 

SEG 

PRD cccccccceeecccccc 

COILS 

MEM 

(No Prosite data available for DKFZphtes3_22nl3 . 3) 
(No Pfam data available for DKFZphtes3_22nl3.3) 
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DKFZphtes3_23111 



group: intracellular transport and trafficking 

DKFZphtes3_23111 encodes a novel 186 amino acid protein nearly identical to mouse ADP- 
ribosylation-like factor homolog 6 (Arl6) . 

Protein secretion through the endoplasmic reticulum and the Golgi vesicular trafficking system 
is initiated by the binding of ADP-ribosylation factors 

(ARFs) to donor membranes, leading to recruitment of cocatomer, bud formation, and eventual 
vesicle release. ARFs are approximately 20-kDa GTPases that are active with bound GTP and 
inactive with GDP bound. The novel protein contains an ATP/GTP-binding site motif A IP-loop) 
and seems to be a novel ARF. it seems to have an important role in vesicular transport and 
vesicular trafficking. 

The new protein can find application in modulating vesicle transport and trafficking in cells. 

nearly identical to mouse Arl6, ADP-ribosylation-like factor homolog 
start at Bp 15 matches kozak consensus ANNatgG 
Sequenced by LMU 
Locus : unknown 
Insert length: 717 bp 

Poly A stretch at pos. 689, no polyadenylation signal found 

1 ATTTGAATCA CATTATGGGA TTGCTAGACA GACTTTCAGT CTTGCTTGGC 
51 CTGAAGAAGA AGGAGGTTCA TGTTTTGTGC CTTGGGCTAG ATAATAGTGG 
101 CAAAACGACG ATCATTAACA AACTTAAACC TTCAAATGCT CAATCTCAAA 
151 ATATCCTTCC AACAATAGGA TTCAGCATAG AGAAATTCAA ATCATCCAGT 
201 TTGTCATTTA CAGTGTTTGA CATGTCAGGT CAAGGAAGAT ACAGAAATCT 
251 CTGGGAACAC TATTATAAAG AAGGCCAAGC TATTATTTTT GTCATTGATA 
301 GTAGTGATAG ATTAAGAATG GTTGTGGCCA AAGAAGAACT CGATACTCTT 
351 CTGAATCATC CAGATATTAA ACACCGTCGA ATTCCAATCT TATTCTTTGC 
401 AAATAAAATG GATCTTAGAG ATGCAGTGAC ATCTGTAAAA GTGTCTCAGT 
451 TGCTGTGTTT AGAGAACATC AAAGATAAAC CCTGGCATAT TTGTGCTAGT 
501 GATGCCATAA AAGGAGAAGG CTTGCAAGAA GGTGTAGACT GGCTTCAAGA 
551 TCAGATCCAG ACTGTGAAGA CATGAAAAGA TAATAGTTGG AAACCTCAGC 
601 AATTTTCAAT TCAAGGAATC TATCTAAGAC AAATAGAATA CATTTTGTAA 
651 AAGATGTTTA TGCATCAAAA AATATAATTT TCTGCTTGCA AAAAAAAAAA 
701 AAAAAAAAAA AAAAAAG 

BLAST Results 



No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 15 bp to 572 bp; peptide length: 186 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic ' 
Prosite motifs: ATP GTP A (24-32) 



1 MGLLDRLSVL LGLKKKEVHV LCLGLDNSGK TTIINKLKPS NAQSQNILPT 

51 IGFSIEKFKS SSLSFTVFDM SGQGRYRNLW EHYYKEGQAI IFVIDSSDRL 

101 RMVVAKEELD TLLNHPDIKH RRIPILFFAN KMDLRDAVTS VKVSQLLCLE 

151 NIKDKPWHIC ASDAI KGEGL QEGVDWLQDQ IQTVKT 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_23111, frame 3 

TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 
homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds., N « l r Score = 923, P » l.le-92 

TREMBL:CEC38D4_5 gene: "C38D4.8"; Caenorhabditis elegans cosmid C38D4, 
N * 1, Score <= 418, P « 3.6e-39 

PIR:S66337 ADP-ribosylation factor 1 - Chlamydomonas reinhardtii, N ■» 
1, Score - 373, P = 2.1e-34 

SWISSPR0T:ARF1_CHLRE ADP-RIBOSYLATION FACTOR 1., N « 1, Score =372, P 
- 2.7e-34 



>TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 

homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds . 
Length - 186 

HSPs: 



Score = 923 (138.5 bits), Expect - l.le-92, P - l.le-92 
Identities - 178/186 (95%), Positives = 184/186 (98%) 



Query: 


1 


MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 


60 






MGLLDRLS LLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQ+I+PTIGFSIEKFKS 




Sbjct: 


1 


MGLLDRLSGLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQDIVPTIGFSIEKFKS 


60 


Query: 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 


120 




SSLSFTVFDMSGQGRYRNLWEHYYK+GQAIIFVIDSSD+LRMVVAKEELDTLLNHPDIKH 




Sbjct: 


61 


SSLSFTVFDMS GQG RYRN L WEH YY KDGQA 1 1 FV I D S S DKL RMV V AKE ELDTLLNHPDI KH 


120 


Query: 


121 


RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


180 






RRIPILFFANKMDLRD+VTSVKVSQLLCLE+IKDKPWHICASDAIKGEGLQEGVDWLQDQ 




Sbjct: 


121 


RRIPILFFANKMDLRDSVTSVKVSQLLCLESIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


180 


Query: 


181 


IQTVKT 186 








IQ VKT 




Sbjct: 


181 


IQAVKT 186 





Pedant information for DKFZphtes3_23111, frame 3 



Report for DKFZphtes3_23111 .3 



[LENGTH] 186 

[MW] 21097.69 

[pi] 8.72 

[HOMOL] TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor homolog 

ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 (Arl6) mRNA, complete cds. 4e-94 



ion, 



[ FUNCAT ) 
[FUNCAT1 
[FUNCAT] 
le-36 
[FUNCAT] 
YDL137W] 2e-36 
[FUNCAT] 
palmitylati 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT J 
[FUNCAT] 

[S. 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-04 
[FUNCAT] 
4e-04 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 



30.08 organization of golgi [S. cerevisiae, YDLl92w] le-36 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w] le-36 

08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDL192w] 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



06.07 protein modification (glycolsylation, acylation, myristylation, 



farnesylation and processing) [S. 

30.03 organization of cytoplasm tS. 
03.22 cell cycle control and mitosis [S. 

30.04 organization of cytoskeleton [S. 
r general function prediction (M. 
30.02 organization of plasma membrane 



cerevisiae, YBR164c] 2e-32 
cerevisiae, YBR164c] 2e-32 
cerevisiae, YMRl38w] 4e-19 
cerevisiae, YMRl38w] 4e-19 
jannaschii, MJ1339] 2e-05 

[S. cerevisiae, YHROOSc] 4e-05 



03.07 pheromone response, mating-type determination, sex-specific proteins 
evisiae, YHROOSc] 4e-05 

10.05.07 g-proteins [S. cerevisiae, YHROOSc] 4e-05 

08.13 vacuolar transport [S. cerevisiae, YKR014c] 2e-04 

08.19 cellular import [S. cerevisiae, YKR014c] 2e-04 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YKR014c] 
03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 
BL01288C 

BL01020C SARI family proteins 

BL01019C ADP-ribosylation factors family proteins 



734 



WO 01/12659 



PCT7IB00/01496 



[BLOCKS] 


BL01019B ADP-ribosylation factors family proteins 


[BLOCKS] 


BL01019A ADP-ribosylation factors family proteins 


[SCOP J 


dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai 2e-45 


[SCOP) 


dlmhl 3.29.1.4.2 Racl [Human (Homo sapiens) 2e-46 


[SCOP] 


d5p21 3.29.1.4.1 cH-p21 Ras protein [human (Homo sapiens) 5e-37 


(SCOP) 


dlhura 3.29.1.4.8 ADP-ribosylation factor 1 (ARFl) [human (Horn 4e-61 


[SCOP] 


dla2kc_ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) [Do 4e-33 


[PIRKW] 


glycoprotein 2e-33 


[PIRKW] 


monomer 3e-31 


[PIRKW] 


P-loop 2e-35 


[ PIRKW] 


lipoprotein 2e-33 


[PIRKW] 


GTP binding 2e-35 


[SUPFAM] 


ADP-ribosylation factor 2e-35 


[PROSITE] 


ATP_GTP_A 1 


[PFAM] 


ADP-ribosylation factors (Arf family) (contains ATP /GTP binding P-loop) 


[KW] 


Alpha Beta 


[KW] 


3D 


[KW] 


LOW_COMPLEXITY 5.91 % 



SEQ MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 

SEG . . xxxxxxxxxxx 

lhurA CCCCEEEEEETTTTCHHHHHHHHCCCCEEEE— EEETTEEEEEEEE 

SEQ SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 

SEG 

lhurA TTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHHTTTT-- 

SEQ RRI PILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 

SEG 

lhurA TTTEEEEEEETTTTTTTCCHHHHHHHHCGGGTTTTCEEEEECBTTTTBTHHHHHHHHHHH 

SEQ IQTVKT 

SEG 

lhurA HHHHC . 



Prosite for DKFZphtes3_23111 .3 
PS00017 24->32 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_23111 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

* GMgWf s I Fr kMWGlWNKEMRI LMLGLDNAGKTTI LYMLKlgE . . IVTTI 
MG++ ++ ++GL +KE+++L LGLDN+GKTTI+++LK+ ++ 
1 -MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNIL 48 

PTIGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGIIWVVDSaD 
PTIGF +E+ + ++F+V+D GQ + R +W HYY + ++II+V+DS+D 
4 9 PTIGFSIEKFKSSSLSFTVFDMSGQGRYRNLWEH YYKEGQAI I FVI DSSD 98 

RDRMeEaKqELHaMLNEEEL , . rDAPlLIFANKQDLPgAMSesEIREaLG 
R RM AK+EL+ +LN+ ++ R+ P+L FANK DL++A+++ +++ +L 
99 RLRMVVAKEELDTLLNHPDIKHRRIPILFFANKMDLRDAVTSVKVSQLLC 148 

LHelRCnRPWYIQMCCAVtGEGLYEGMDWLSNYInkRkK* 
L++I+ + PW+I +++A++GEGL+EG DWL ++I+ K 
149 LENT K- DKPWH I CAS DAI KGEGLQEGVDWLQDQI QTVKT 186 
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DKFZphtes3_23nl9 



group: testes derived 

DKFZphtes3_23nl9 encodes a novel 387 amino acid protein with similarity to rat protein kinase 
C-interacting RBCC protein 1. 

The novel protein contains not the RING-B box-coiled coil (RBCC) motif of RBCC protein 1, and 

thus is not a member of this subgroup of RING finger proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCKl 

Sequenced by LMU 

Locus : unknown 

Insert length: 1579 bp 

Poly A stretch at pos . 1535, polyadenylation signal at pos. 1515 



1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
451 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
1451 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
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Category: similarity to known protein 
Classification: Cell signaling/communication 



. 1 MA P PAG G AAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 
51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 
101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 
151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 
201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 
251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 
301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 
351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score - 353, P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2"; Rattus norvegicus mRNA for RBCK2 , 
complete cds., N * 1, Score = 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4 ) mRNA, complete cds., N = 1, Score =28 6, P 

- 8.5e-25 

TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N ■ 1, Score 

- 367, P - 9.3e-34 



>TREMBLNEW:AF124 663_1 product: M UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length =4 98 

HSPs : 



Score = 367 (55.1 bits), Expect - 9.3e-34, P - 9.3e-34 
Identities = 95/212 (44%), Positives - 129/212 (60%) 

Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 ASSAHVALQVHPHCTvAALQEQVFSELGFPPAvQRWVIGRCLCVPERSLASYGvRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRP DMT VAS L K DMV FL D YG F P P S LQQW WGQRLA RDQET L HS HG I RRNG DG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG— RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 346 SPLQP— SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 

Pedant information for DKFZphtes3_23nl9, frame 2 



Report for DKFZphtes3_23nl9 . 2 

[ LENGTH ] 387 

[MW] 39949.29 

[pi] 5.53 

[HOMOL] TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28°; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 

[BLOCKS] BL00578B 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 
SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 



PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 
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SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVIAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 



(No Prosite data available for DKFZphtes3_23nl9.2) 
(No Pfam data available for DKFZphtes3_23nl9.2) 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCK1 

Sequenced by LMU 

Locus: unknown 

Insert length: 1579 bp 

Poly A stretch at pos . 1535, polyadenylation signal at pos . 1515 



1 CGGAGACCCT CGGGCCGTGT 

51 TTGGCCAGAG TGCACCACTC 

101 CGCCCGGAAC CCCAGGTTCG 

151 TCAGCGGCCC GATCCCACGG 

201 GACCGGAGAT GGCGCCGCCA 

251 TTGGGCTCCG CCGCAGTGCT 

301 GGGCGCCGGG CCAGACGCCG 

351 CGGACCCTGA GAGGCCTGGG 

401 CCTGGGGCGG TTAATTTGGA 

451 CCGAGGCCCC ACCCAGCACG 

501 CCCTCAGCCT GCACTTCCTC 

551 CTAGTCCGAG GTGCCACCGT 

601 ACCACCAGCC TTGGGCCCAG 

651 CGGAAGCCTC CACACTCAAG 

701 AGCCCTGGAA ACTTGACGGA 

751 GGCTATTGCA GGTGGAGACG 

801 TGGCCCAGCA TCGTGTGGCC 

851 CCACCTGGCC CCATCAGGCT 

901 CGCATCCGCC GCGTCCTCTG 

951 GCACTGTTGC AGCTCTCCAG 

1001 CCAGCCGTGC AACGCTGGGT 

1051 CAGCCTTGCC TCTTACGGGG 

1101 ACTTGCTGTC AGCTCCTCGA 

1151 CACCCCCAGA AGATGGACGG 

1201 GGGGCTACCC CCAGGCCCCC 

1251 TCCAGCCCAG CTGGTCCTGT 

1301 CGCCCTGGCT GTGAGATGTG 

1351 CCTTGCTGCA GCTTCCACCT 

14 01 GGCCCTTCCC TCACAAGTCC 

1451 GACCTCTACT GACTGCTTGC 

1501 CCACAAAATG AAACCATTAA 

1551 AAAAAAAAAA AAAAAAAAAA 



CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
GGAAGGACAG AATGGCAGCA AGAGCAACTC 
AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
CACACGTTGC CCTGCAGGTC CACCCCCACT 
GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
GGAACTTGGA CGCTTGTTTC CCCCATCATT 
AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
GACATCTCCA GGCCCCCACT GAACTCCGGG 
TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
AAAAAAAAG 
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BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
Category: similarity to known protein 
Classification: Cell signaling /communication 



1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 
51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 
101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 
151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 
201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 
251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 
301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 
351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N - 1, 
Score = 353, P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2"; Rattus norvegicus mRNA for RBCK2, 
complete cds . , N = 1, Score = 353, P - 2.8e-32 

TREMBL:U67322_1 gene: "XAP4 " ; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N - 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N - 1, Score 
- 367, P = 9.3e-34 



>TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length « 498 

HSPs: 

Score = 367 (55.1 bits), Expect = 9.3e-34, P « 9.3e-34 
Identities « 95/212 (44%), Positives = 129/212 (60%) 

Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG--RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 346 SPLQP— SWSCPSCTFI NAP DRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 



Pedant information for DKFZphtes3_23nl9, frame 2 
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Report for DKFZphtes3_23nl9 . 2 

{ LENGTH] 387 

(MW1 39949.29 

[pi] 5.53 

[ HOMOL ] TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 niRNA, complete cds. le-22 
[BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MA P P AGG AAAAA S DLGS AA V L LA VH AAV RP LG AG PDAE AQL RRLQLS AD PE RPGRFRL EL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 

SEQ L GAG PGA VN LEW PLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNP QEAQRW A VLV RG AT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RA I AGG DEKG AAQV AA VL AQHRV ALS VQLQ E AC F P PG P I RL QVT L E DAAS AAS AAS S AH V 

SEG xxxxxxxxxxx. . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtes3_23nl9 .2) 
(No Pfam data available for DKFZphtes3_23nl9.2) 
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DKFZphtes3_26g22 



group: intracellular transport/trafficking 

DKFZphtes3_26g22 encodes a novel 898 amino acid protein with similarity to kinesins. 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a kinesin motor domain 
signature. Kinesin is a microtubule-associated force-producing protein that play a role in 
organelle transport. It is an oligomeric complex composed of two heavy chains and two light 
chains. The kinesin motor activity is directed toward the microtubule's plus end. The heavy 
chain contains a large globular N-terminal domain which is responsible for the motor activity 
of kinesin, which is known to hydrolyze ATP and to bind and move on microtubules. Several 
proteins involved in chromosome segregation and cell divsion contain this motor domain, such 
as drosophila claret segregational protein (ncd) , Drosophila kinesin-like protein {nod), human 
CENP-E and human mitotic kinesin-like protein-1 (MKLP-1) . The novel protein is a new kinesin 
like proptein. 

The new protein can find application in modulating chromosome transport in mitosis and meiosis 
and modulation of cell division. 



strong similarity to kinesins 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3032 bp 

No poly A stretch found, no polyadenylation signal found 



1 CTGAAGCGCT GGGAGGCGGA 
51 CCTGGGCCTG AAGTGAGTGA 
101 TATACAGATA GGAATCAAGA 
151 CTGTGCCACC ATATGAAAGT 
201 AGAAAAAGCA GCTGGATTTC 
251 TCCTAGTTTT TGATCCCAAA 
301 AAAACTACAA ATCAAAATGT 
351 TGTATTTGAT GCTGTTTTTG 
401 AACACACTAC TAAGCCAATT 
451 ACAGTACTTG CCTATGGTGC 
501 AGGATCAGCT GATGAACCTG 
551 ACAAATGCAT GGATGAGATT 
601 TCATATCTGG AGGTATATAA 
651 AGGGCCACTT GCTGTCCGGG 
701 GACTTACTTT ACACCAGCCC 
751 GATAATGGAA ACAAAAACAG 
801 ATCTTCTCGT TCTCATGCTG 
851 AAACAGCAAG TATCAATCAA 
901 GACCTGGCAG GATCTGAGCG 
951 ATTTGTAGAA GGCACAAATA 
1001 TCATCAATGC CTTAGCAGAT 
1051 AGAAATAGTA AGCTTACTCG 
1101 TCAAACTATA ATGATAGCTG 
1151 ACACATATAA CACTCTTAAG 
1201 TCTTTGAAGA GCAATGTTCT 
1251 AAAGATCTGT AATGAGCAGA 
1301 TAAAAGCCTA TGAAGAACAG 
1351 AAGTTAATGA TTTCAAACCC 
1401 AATCCTGAAC TGCTTGTTCC 
1451 TGAAGTTGGA AATGTTACTT 
1501 CAACAGTGCC ATAAACAAAT 
1551 AAAGGCCACT GGAAAACGAG 
1601 GCTCCTACCT GGAGAAAAGG 
1651 AATACTAATT GGCTCCATCG 
1701 AAACGGTCAT ATTCCAAAGG 
1751 TGCACCTCCA GAACAAAGAT 
1801 CTAGCTTGTC TTCAGGAACA 
1851 TGCTTTACTT CCAACCCTAA 
1901 GCCTGTCAAA TGCTGCTTTT 
1951 GTAGAGAGGA AAAAAGTGGT 
2001 AAAGCAAAAC GATCTACCAG 
2051 TTGGACCAGT TCAGCCTATT 
2101 CTGGTTAAGA TTCCTACAGA 
2151 TCCCTTGAAA GGACAGCATA 
2201 AGCTCAATGA TTCTCTTAGC 
2251 GAAGACTGTA GAAAAGCTTT 
2301 ACCATCATCA TTTACTACAA 
2351 GTGATAATTG TCTGAAAATG 



CATTAAAGTG AAGTGGTTGC GGTAACCTGG 
GAGGCACATG AAGAGAAGTA TTCAAGTATT 
TAATCAACAA TGTCTGTCAC TGAGGAAGAC 
AGTAGTTCGT GTACGTCCGG AAAACACTAA 
ATAAAGTGGT TCATGTTGTG GATAAACATA 
CAAGAAGAAG TCAGTTTTTT CCATGGAAAG 
TATAAAGAAA CAAAATAAGG ATCTTAAATT 
ATGAAACGTC AACTCAGTCA GAAGTTTTTG 
CTTCGTAGTT TTTTGAATGG ATATAATTGC 
CACTGGTGCT GGGAAGACCC ACACTATGCT 
GAGTGATGTA TCTAACAATG TTACACCTTT 
AAAGAAGAGA AAATATGTAG TACTGCAGTT 
TGAACAGATT CGTGATCTCT TAGTAAATTC 
AAGATACCCA AAAAGGGGTG GTCGTTCATG 
AAATCCTCAG AAGAAATTTT ACATTTATTG 
GACACAACAT CCCACTGATA TGAATGCCAC 
TTTTCCAAAT TTACTTGCGA CAACAAGACA 
AATGTCCGTA TTGCCAAGAT GTCACTCATT 
AGCAAGTACT TCCGGTGCTA AGGGGACCCG 
TTAATAGATC ACTTTTAGCT CTTGGGAATG 
TCAAAGAGAA AGAATCAGCA TATCCCTTAC 
CTTGTTAAAG GATTCTCTTG GAGGAAACTG 
CTGTTAGTCC TTCCTCTGTA TTCTACGATG 
TATGCTAACC GGGCAAAGGA CATTAAATCT 
TAATGTCAAT AATCATATAA CTCAATATGT 
AGGCAGAGAT TTTATTGTTA AAAGAAAAAC 
AAAGCCTTCA CTAATGAAAA TGACCAAGCA 
TCAGGAAAAA GAAATCGAAA GGTTTCAAGA 
AGAATCGAGA AGAAATTAGA CAAGAATATC 
AAAGAAAATG AACTTAAATC ATTCTACCAA 
AGAAATGATG TGTTCTGAAG ACAAAGTAGA 
ATCATAGACT TGCAATGTTG AAAACTCGTC 
AGGGAGGAGG AATTGAAGCA ATT TG AT GAG 
TGTCGAAAAA GAAATGGGAC TCTTAAGTCA 
AACTCAAGAA AGATCTTCAT TGTCACCATT 
TTGAAAGCAC AAATTAGACA TATGATGGAT 
GCAACACAGG CAGACTGAAG CAGTATTGAA 
GAAAACAATA TTGCACATTA AAAGAAGCCG 
GAATCTGACT TCAAAGAGAT CGAACATTTG 
AGTTTGGGCT GACCAAACTG CCGAACAACC 
GGATTTCTGT TCTTATGACC TTTCCACAAC 
CCTTGTTGCT CATCTTCAGG TGGAACTAAT 
AAAAAGAACT CGGAGAAAAC TAATGCCATC 
CTCTAAAGTC TCCACCATCT CAAAGTGTGC 
AAAGAACTTC AGCCTATTGT ATATACACCA 
TCAAAATCCG TCTACAGTAA CCTTAATGAA 
GTTTTCAGGC TATCAGCTCA AACATAAACA 
TTGTGTGAAG TAGCTATCCC TCATAATAGA 
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2401 AGAAAAGAAT GTGGACAGGA GGACTTGGAC TCTACATTTA CTATATGTGA 
2451 AGACATCAAG AGCTCGAAGT GTAAATTACC CGAACAAGAA TCACTACCAA 
2501 ATGATAACAA AGACATTTTA CAACGGCTTG ATCCTTCTTC ATTCTCAACT 
2551 AAGCATTCTA TGCCTGTACC AAGCATGGTG CCATCCTACA TGGCAATGAC 
2601 TACTGCTGCC AAAAGGAAAC GGAAATTAAC AAGTTCTACA TCAAACAGTT 
2651 CGTTAACTGC AGACGTAAAT TCTGGATTTG CCAAACGTGT TCGACAAGAT 
2701 AATTCAAGTG AGAAGCACTT ACAAGAAAAC AAACCAACAA TGGAACATAA 
2751 AAGAAACATC TGTAAAATAA ATCCAAGCAT GGTTAGAAAA TTTGGAAGAA 
2801 ATATTTCAAA AGGAAATCTA AGATAAATCA CTTCAAAACC AAGCAAAATG 
2851 AAGTTGATCA AATCTGCTTT TCAAAGTTTA TCAATACCCT TTCAAAAATA 
2901 TATTTAAAAT CTTTGAAAGA AGACCCATCT TAAAGCTAAG TTTACCCAAG 
2951 TACTTTCAGC AAGCAGAAAA ATGAAACTCT TTGTTTTCTT CTTTTGTGTT 
3001 CTAAAAAAAT AAAATTTCAA AAGAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 130 bp to 2823 bp; peptide length: 898 
Category: strong similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (113-121) 
KINESIN_MOTOR_DOMAIN1 (252-264) 



1 MSVTEEDLCH HMKVVVRVRP 
51 VSFFHGKKTT NQNVIKKQNK 
101 FLNGYNCTVL AYGATGAGKT 
151 KICSTAVSYL EVYNEQIRDL 
201 EEILHLLDNG NKNRTQHPTD 
251 IAKMSLIDLA GSERASTSGA 
301 KNQHIPYRNS KLTRLLKDSL 
351 RAKDIKSSLK SNVLNVNNHI 
401 TNENDQAKLM ISNPQEKEIE 
451 ELKSFYQQQC HKQIEMMCSE 
501 ELKQFDENTN WLHRVEKEMG 
551 QIRHMMDLAC LQEQQHRQTE 
601 FKEIEHLVER KKVVVWADQT 
651 SSSGGTNLVK IPTEKRTRRK 
701 QPIVYTPEDC RKAFQNPSTV 
751 VAIPHNRRKE CGQEDLDSTF 
801 DPSSFSTKHS MPVPSMVPSY 
851 AKRVRQDNSS EKHLQENKPT 



ENTKEKAAGF HKVVHVVDKH ILVFDPKQEE 
DLKFVFDAVF DETSTQSEVF EHTTKPILRS 
HTMLGSADEP GVMYLTMLHL YKCMDEIKEE 
LVNSGPLAVR EDTQKGVWH GLTLHQPKSS 
MNATSSRSHA VFQIYLRQQD KTASINQNVR 
KGTRFVEGTN INRSLLALGN VINALADSKR 
GGNCQTIMIA AVSPSSVFYD DTYNTLKYAN 
TQYVKICNEQ KAEILLLKEK LKAYEEQKAF 
RFQEILNCLF QNREEIRQEY LKLEMLLKEN 
DKVEKATGKR DHRLAMLKTR RSYLEKRREE 
LLSQNGHIPK ELKKDLHCHH LHLQNKDLKA 
AVLNALLPTL RKQYCTLKEA GLSNAAFESD 
AEQPKQNDLP GISVLMTFPQ LGPVQPIPCC 
LMPSPLKGQH TLKSPPSQSV QLNDSLSKEL 
TLMKPSSFTT SFQAISSNIN SDNCLKMLCE 
TICEDIKSSK CKLPEQESLP NDNKDILQRL 
MAMTTAAKRK RKLTSSTSNS SLTADVNSGF 
MEHKRNICKI NPSMVRKFGR NISKGNLR 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_26g22, frame 1 

SWISSPROT:YB3D_SCHPO PUTATIVE KINESIN-LIKE PROTEIN C2F12.13., N = 3, 
Score = 874, P « 9e-93 

TREMBL:DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
melanogaster kinesin like protein 67a mRNA, complete cds., N « 1, Score 
= 880, P « 4.2e-88 

TREMBL:SPBC649_1 gene: "SPBC64 9.01c"; product: "putative kinesin-like 
protein"; S.pombe chromosome II cosraid c649., N =» 3, Score = 814, P = 
9.8e-86 

PIR:S64238 kinesin-related protein KIP3 - yeast (Saccharomyces 
cerevisiae), N = 2, Score •= 802, P *» 2.5e-83 



>TREMBL:DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
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melanogaster kinesin like protein 67a mRNA, complete cds. 
Length = 814 

HSPs: 

Score = 980 (132.0 bits), Expect = 4.2e-88, P « 4.2e-88 
Identities - 181/345 (52%), Positives « 238/345 (68%) 



Query: 11 HMKWVRVRPENTKEKAAGFHKVVHWDKHILVFDPKQEEVSFF-HGKKTTNQNVIKKQN 69 

++KV VRVRP N +E ++ V+D+ L+FDP +E+ FF G K +++ K+ N 

Sbjct: 8 NIKVAVRVRPYNVRELEQKQRSIIKVMDRSALLFDPDEEDDEFFFQGAKQPYRDITKRMN 67 

Query: 70 KDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKTHTMLGSADE 129 

K L FD VFD ++ ++FE T P++ + LNGYNC+V YGATGAGKT TMLGS 
Sbjct: 68 KKLTMEFDRVFDIDNSNQDLFEECTAPLVDAVLNGYNCSVFVYGATGAGKTFTMLGSEAH 127 

Query: 130 PGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVREDTQKGVVV 189 

PG+ YLTM L+ + + + VSYLEVYNE + +LL SGPL +RED GVW 

Sbjct: 128 PGLTYLTMQDLFDKIQAQSDVRKFDVGVSYLEVYNEHVMNLLTKSGPLKLREDNN-GVVV 186 

Query: 190 HGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNV 249 

GL L S+EE+L +L GN +RTQHPTD NA SSRSHA+FQ+++R ++ + V 

Sbjct: 187 S G LC LT P I Y S AEE LL RMLMLGN S H RTQH PT DAN AES S RS HA I FQ VH IRITERKTDTKRTV 246 

Query: 250 RIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKRKNQHIPYRN 309 

K+S+IDLAGSERA+++ G RF EG +IN+SLLALGN IN LAD + HIPYR+ 
Sbjct: 247 KLSMIDLAGSERAASTKGIGVRFKEGASINKSLLALGNCINKLADGLK HIPYRD 300 

Query: 310 SKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDI 355 

S LTR+LKDSLGGNC+T+M+A VS SS+ Y+DTYNTLKYA+RAK I 
Sbjct: 301 SNLTRILKDSLGGNCRTLMVANVSMSSLTYEDTYNTLKYASRAKKI 346 

Pedant information for DKFZphtes3_26g22, frame 1 



Report for DKFZphtes3_26g22 . 1 

[LENGTH] 898 

;MW] .102281.63 

;pl] 9.09 

;H0M0L] SWISSPROT:YB3D_SCHPO PUTATIVE KINESIN-LIKE PROTEIN C2F12.13. 3e-97 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YGL216w] 2e-88 

;FUNCATJ 03.22 cell cycle control and mitosis [S. cerevisiae, YGL216w] 2e-88 

; FUNCAT) 08.22 cytoskeleton-dependent transport [S. cerevisiae, YGL216w] 2e-88 

; FUNCAT J 30.10 nuclear organization [S. cerevisiae, YGL216w] 2e-88 

[ FUNCAT) 09.10 nuclear biogenesis [S. cerevisiae, YPR141c] 5e-42 

; FUNCAT) 06.10 assembly of protein complexes (S. cerevisiae, YPR141c] 5e-42 

[FUNCAT) 03.13 meiosis [S. cerevisiae, YPR141c] 5e-42 

;FUNCAT) 11.01 stress response [S. cerevisiae, YPR141c] 5e-42 

FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPRl41c] 5e-42 

FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YPRl41c) 5e-42 

FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YKL079w) 
le-28 

BLOCKS] BL00411H 

BLOCKS] BL00411G 

BLOCKS] BL00411F 

BLOCKS] BL00411E Kinesin motor domain proteins 

BLOCKS] BL00411C Kinesin motor domain proteins 

BLOCKS] BL00411B Kinesin motor domain proteins 

BLOCKS) BL00411A Kinesin motor domain proteins 

SCOP] d2kin.l 3.29.1.5.3 Kinesin [Rat (Rattus norvegicus) le-117 

SCOP] d3kar 3.29.1.5.4 Kinesin [Baker's yeast (Saccharomyce le-112 

PIRKW] nucleus 6e-87 

PIRKW] heterodimer 4e-68 

PIRKW] DNA binding 9e-60 

PIRKW] heterotetraitier 2e-54 

PIRKW] mitosis 9e-60 

PIRKW] microtubule binding 4e-68 

PIRKW] ATP 6e-87 

PIRKW] phosphoprotein 5e-59 

PIRKW) heterotrimer 4e-6B 

PIRKW] purine nucleotide binding le-26 

PIRKW] P-loop 6e-B7 

PIRKW) coiled coil 4e-68 

PIRKW) heptad repeat 3e-62 

PIRKW] methylated amino acid 2e-54 

PIRKW] hydrolase 2e-54 

PIRKW] GTP binding le-60 
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[PIRKW] cell division 5e-57 

(SUPFAM] kinesin-related protein KIPl 3e-50 

[SUPFAM] kinesin-related protein CIN8 7e-33 

[SUPFAM) kinesin heavy chain 2e-54 

(SUPFAM) suppressor protein SMY1 le-26 

[SUPFAM] kinesin-related protein KIF3 4e-68 

[SUPFAM] kinesin-related protein KIF2 le-46 

[SUPFAM] kinesin-related protein unc-104 7e-60 

[SUPFAM] unassigned kinesin-related proteins 6e-87 

[SUPFAM] centromere protein E 3e-54 

[SUPFAM] kinesin-related protein KLP61F 5e-57 

[SUPFAM] kinesin-related protein MKLP-1 2e-28 

[SUPFAM] pleckstrin repeat homology 7e-60 

[SUPFAM] kinesin-related protein KIF1B 4e-61 

[SUPFAM] kinesin motor domain homology 6e-87 

[SUPFAM] kinesin-related protein KLPA le-43 

[SUPFAM] kinesin-related protein nodA le-30 

[SUPFAM) kinesin-related protein Eg5 5e-59 

[PROSITE] ATP GTP_A 1 

[PROSITE] K I NES I N__MOTOR_DOMAI N 1 1 

[PFAM] Kinesin~motor domain 

[KWJ Irregular 

[KW) 3D 

[KW] LOW COMPLEXITY 8.57 % 



SEQ MSVTEEDLCHHMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFFHGKKTT 

SEG 

3kar- TBEEE 

SEQ NQNVIKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKT 

SEG 

3kar- EEEETTTTTTEEEEEETEEETTTTCHHHHHHHHHH-HHHGGGGCCCEEEEEECTTTTCHH 

SEQ HTMLGSADEPGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVR 

SEG 

3kar- HHHHTTTT — THHHHHHHHHHHHHHHHGGGCEEEEEEEEEEEETTEEEETT-TCCCCEEE 

SEQ EDTQKGWVHGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQD 

SEG 

3kar- EETTTEEEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEEEEE 

SEQ KTASINQNVRIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKR 

SEG 

3kar- TTTTCEE EEEEEEEECCCCCCCCCC HHHHHHHHHHHHHHHHHHHHHHHHTTTT 

SEQ KNQHIPYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDIKSSLK 

SEG xxxxx 

3kar- TTTCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHHH 

SEQ SNVLNVNNHITQYVKICNEQKAEILLLKEKLKAYEEQKAFTNENDQAKLMISNPQEKEIE 

SEG xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 



3kar- 



SEQ RFQEILNCLFQNREEIRQEYLKLEMLLKENELKSFYQQQCHKQIEMMCSEDKVEKATGKR 

SEG xxxxxxxxxxxxx 

3kar- 

SEQ DHRLAMLKTRRS YLEKRREEELKQFDENTNWLHRVEKEMGLLSQNGHI PKELKKDLHCHH 

SEG xxxxxxxxxxx 

3kar- 

SEQ LHLQNKDLKAQIRHMMDLACLQEQQHRQTEAVLNALLPTLRKQYCTLKEAGLSNAAFESD 

SEG xxx 

3kar- 

SEQ FKEIEHLVERKKVWWADQTAEQPKQNDLPGISVLMTFPQLGPVQPIPCCSSSGGTNLVK 

SEG 

3kar- 

SEQ IPTEKRTRRKLMPSPLKGQHTLKSPPSQSVQLNDSLSKELQPIVYTPEDCRKAFQNPSTV 

SEG 

3kar- 

SEQ TLMKPSSFTTSFQAISSNINSDNCLKMLCEVAIPHNRRKECGQEDLDSTFTICEDIKSSK 

SEG 

3kar- 

SEQ CKLPEQESLPNDNKDILQRLDPSSFSTKHSMPVPSMVPSYMAMTTAAKRKRKLTSSTSNS 

SEG xxxxxxxxxxxxx 

3kar- 
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SEQ SLTADVNSGFAKRVRQDNSSEKHLQENKPTMEHKRNICKINPSMVRKFGRNISKGNLR 



SEG xxx 
3kar- 



Prosite for DKFZphtes3_26g22 . 1 

PS00017 113->121 ATP_GTP_A PDOC00017 

PS00411 252->264 KINESIN MOTOR DOMAIN 1 PDOC00343 



Pfam for DKFZphtes3_26g22 . 1 
HMM__NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds 

R+RP N +E+++G +VV + + + + +++E S 
Query 17 RVRPENTKEKAAGFHK WHVVD- KH I LVFDPKQEE VS FFHGKKTTNQNV 64 

HMM phk s Ft FDHVFWWncTQedVYdtvAHPI VDDcFhG YNCT I FAYGQ 

+ F+FD VF+ ++TQ +V++ + PI+ ++++GYNCT++AYG 
Query 65 IKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGA 114 

HMM TGSGKTYTMMGpggehPDHmGIIPRcCHDIFdrldkf qekDhdFWhVkCS 
TG+GKT+TM G + D+ G+ + +++++ D + + + +S 
Query 115 TGAGKTHTMLG SADEPGVMYLTMLHLYKCMDEIK-EEKIC-STAVS 158 

HMM YMEIYNEelYDLLCPnPqhMkpLnlHEHPNMGpYVqGCTEf HVcSYeDac 
Y+E+YNE+I+DLL+ N ++PL+++E+ G+ V G+T+ +S E+++ 
Query 159 YLEVYNEQIRDLLV-N SGPLAVREDTQKGVVVHGLTLHQPKSSEEIL 204 

HMM hWIWqGnknRHVAaTnMNdhSSRSHtlFTIHVeQrHk . .qcdehvcHSKM 

H+++ GNKNR+ +T MN++SSRSH++F+I ++Q K + V++ KM 
Query 205 HLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNVRIAKM 254 

HMM NLV D LAGS E Rvn r TG AEGQ Rl KEGcN I N qS L 1 1 LGn V I n a L a Dg qT K Ym Y 

+L+DLAGSER++ +GA G+R+ EG+NIN+SL++LGNVINALAD + 
Query 255 SLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSK 299 

HMM gghgHIPYRDSKLTWILQDSLGGNcKTcMIACIWPadWNYEETLSTLRYA 
+++HIPYR SKLT+LL+DSLGGNC T MIA+++P+ + Y++T +TL+YA 
Query 300 RKNQHIPYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYA 349 

HMM dRAKnl kNkPQINEDPcamalWRrYheQIqdMKhqL* 

+RAK+IK +N++ ++Y+ + K++ 
Query 350 NRAKDIKSSLKSNVLNVN-NHITQYVKICNEQKAEI 384 
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DKFZphtes3_27dl 



group: metabolism 

DKFZphtes3_27dl encodes a novel 712 amino acid protein similar to ubiquitin-specif ic protease 
{EC 3.1.2.15) . 

The novel protein contains both, a ubiquitin carboxyl-terminal hydrolases family 2 signature 
and signature 2. Pfant predicts a new member of the ubiquitin carboxyl-terminal hydrolases 
family 2. The ubiquitin system is responsible for the turn over of proteins. Ubiquitin 
carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The novel protein is a new member of the ubiquitin carboxyl-terminal hydrolases family 2, 
represented by proteins such a3 yeast UBPl-16, human tre-2, human isopeptidase T and others. 

The novel protein can find application in modulation of ubiquitin- and protein metabolism in 
cells . 



similarity to ubiquitin-specific proteases 

complete cDNA, complete cds, 4 EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2871 bp 

Poly A stretch at pos. 2836, no polyadenylation signal found 



1 CCAAACCTGA AAGAGGTTGA 
51 GCAGCGGCCA GGACTTTTCA 
101 ACTGGGCGAT CCTGCAGGGA 
151 GGGGAAAGAC TGATTTTGAG 
201 TGTGAACAGA GACTTGTTTG 
251 ATGCTAGCAA TGGATACGTG 
301 AGACCATTCC AGCCTCAACC 
351 CGACCGAGTC CATTTGGGCT 
401 AGATATATTG AAGAGCATGC 
451 TGTTGCATTG GAGGTGAATG 
501 ATTATGTTCT GAATGATAAC 
551 ACATTAAGTG CCATCAAAAG 
601 GAGGTTTTTA CGGTCCATGG 
651 ACGGTGCCCA ATCTCTGCTT 
701 TGGCACAGGA GAAGGATACT 
751 ACAATCACCC ATTGGAAGAA 
801 TAGTAGTAAA AAGAGAAGTA 
851 GTTAAAGCAG AATTGGAAAG 
901 AGGGCTCGCT CAGTCGACCA 
951 CACAAACGCC AGCATCACCA 
1001 AATGAAATAT CTCAAAAAGT 
1051 AGTAACTCCT GGTGTAACAG 
1101 TGAATTCTGT TCTTCAGGTG 
1151 TTTTTAAAGC TTGATCTGAA 
1201 GACAAGATCT TGTAAGCATC 
1251 TGAATGAATG TCAGGAAAAA 
1301 AGTCTGTCAT CAGGACTAAG 
1351 ACTTATTCAG CCAAAGGAGC 
1401 AATTGCATAC TTTGTTCCAA 
1451 TCACCATTTG CTATGCTACA 
1501 TGGTTACGCC CAACAAGACG 
1551 AAATACAACG TGAATTAGAG 
1601 CCCACTTCTC AAAGGAAACT 
1651 CATTTTTCAT GGACAACTTC 
1701 ACAAATCAAA TACCATAGAA 
1751 GAAAGGTATC AATGCAGTGG 
1801 TACTGAAATG TTGGCCAAAT 
1851 TCTACGTATG TGACCAGTGT 
1901 CCAGTTGTAC TCACAGAAGC 
1951 TCAGGTTCTC AGACTGCACC 
2001 ACCGAGAGAA GATTGGTGTT 
2051 GAGCCCTATT GCTGCAGGGA 
2101 TATCTATGAC TTGTCCGCGG 
2151 CAGGGCACTA CACTGCCTAC 
2201 CACTGCAATG ATTCCAAACT 
2251 GGCTCAAGCT TATATCTTGT 



TTTGTAATGA TTTGCAGGGG GGCACTGGAG 
CTTAGGAGAT CAGCATTTGC CCTGATGGAA 
CTGACCTCTG AGTTATCCAA AGGCCGACCT 
GTTTTAATAG TTTTCAGATG CTTCAAGTGT 
GATTATGCAT TTCTCAGCTA GACTAAATAA 
CAAACATGTT GGGCAGCTGC AGCTTGCTCA 
CTCAGAAATG GCACTGTGTG GACTGCAACA 
TGCCTTAGCT GCTCCCATGT TGCCTGTGGA 
ACTCAAGCAC TTTCAAGAAA GCAGTCATCC 
AGATGTACGT TTTTTGTTAC CTTTGTGATG 
GCAACTGGAG ACCTGAAGTT ACTACGACGT 
TCAAAATTAT CACTGCACAA CTCGTAGTGG 
GTACAGGTGA TGATTCTTAT TTCTTACATG 
CAAAGTGAAG ATCAACTGTA TACTGCTCTT 
AATGGGTAAA ATCTTTCGAA CATGGTTTGA 
AAAAGCAAGA AGAACCATTT CAGGAGAAAA 
AAGAAAAGAC GGCAGGAATT GGAGTATCAA 
TATGCCTCCA AGAAAGAGTT TACGTTTACA 
TAATAGAAAT AGTTTCTGTT CAGGTGCCAG 
GCAAAAGATA AAGTACTCTC TACCTCAGAA 
CAGTGACTCC TCAGTTAAAC GAAGGCCAAT 
GATTGAGAAA TTTGGGAAAT ACTTGCTATA 
TTGAGTCATT TACTTATTTT TCGACAATGT 
CCAATGGCTG GCTATGACTG CTAGCGAGAA 
CACCAGTCAC AGATACAGTA GTATATCAAA 
GATACAGGTT TTGTTTGCTC CAGACAATCA 
TGGTGGAGCA TCAAAAGGTA GAAAGATGGA 
CAACTTCACA GTACATTTCT CTTTGTCATG 
GTCATGTGGT CTGGAAAGTG GGCGTTGGTC 
CTCAGTGTGG AGACTCATTC CTGCCTTTCG 
CTCAGGAATT TCTTTGTGAA CTTTTAGATA 
ACAACTGGTA CCAGTTTACC AGCTCTTATC 
CATCAAACAA GTTCTGAATG TTGTAAATAA 
TTAGTCAGGT TACATGTCTT GCATGTGACA 
CCTTTCTGGG ACTTGTCATT GGAGTTTCCA 
AAAAGATATT GCTTCCCAGC CATGTCTGGT 
TTACAGAAAC TGAAGCTTTA GAAGGAAAAA 
AACTCAAAGC GTAGAAGGTT TTCCTCCAAA 
CCAGAAACAA CT TAT GAT AT GCCACCTACC 
TCAAACGATT CAGGTGGTCA GGACGTAATA 
CATGTTGGCT TTGAGGAAAT CTTAAACATG 
GACCCTGAAA TCCCTCAGAC CAGAATGCTT 
TGGTGATGCA CCATGGGAAA GGATTTGGCT 
TGCTATAATT CTGAAGGAGG GTTCTGGGTA 
AAGCATGTGC ACTATGGATG AAGTATGCAA 
TTTATACCCA ACGAGTTACT GAGAATGGAC 
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2301 ATTCTAAACT TTTGCCTCCA GAGCTCCTGT TGGGGAGCCA ACATCCCAAT 
2351 GAAGACGCTG ATACCTCGTC TAATGAAATC CTTAGCTGAT CCAAAGACAA 
2401 TGGGGTTTTC TTCCTGTGAT TTATATATAT ACTTTTTAAA AGACTGATGT 
24 51 ACCATTTTAA ACTTCATTTT TTCTTGTGAA TCAGTGTATA CTACATTTAT 
2501 ACATTTTATA TCTAACAATT tTTTTTTTTT ACAAAGTATA AATGTATATA 
2551 TCAACTGAAG GTAACTACTT TTTTCATATT TGGAGTTTTA AACTTTTGGT 
2601 GTTTACCTCA GACTGATGTT ACCTCTTTTA TATTTTTATG TCTTAATTGG 
2651 CTCGGATGAT GAACTTGTGC AATCTTCTAC CAACAAAGTT CAAGTGGCAT 
2701 CATTTTATAT ACATGTATCT TTTTCAGGTA TTTTCTATAC AAATTCTTAA 
2751 TAGATGGAAA ATTAGACTCT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2801 AAAAAAAAAA AAAAAAAAAA AAGGGGCGGC CGCTCTAAAA AAAAAAAAAA 
2851 AAAAAAAAAA AAAAAAAAAG G 



BLAST Results 



NO BLAST result 



Medline entries 



98072201: 

Regulation of ubiquitin-dependent processes by deubiquitinating 
enzymes . 

98431658: 

The ubiquitin system. 



Peptide information for frame 2 



ORF from 251 bp to 2386 bp; peptide length: 712 
Category: similarity to known protein 
Prosite motifs: UCH_2_1 (274-290) 
UCH_2_2 (619-638) 
UCH_2_2 (619-638) 



1 MLAMDTCKHV GQLQLAQDHS SLNPQKWHCV DCNTTESIWA CLSCSHVACG 
51 RYIEEHALKH FQESSHPVAL EVNEMYVFCY LCDDYVLNDN ATGDLKLLRR 
101 TLSAIKSQNY HCTTRSGRFL RSMGTGDDSY FLHDGAQSLL QSEDQLYTAL 
151 WHRRRILMGK IFRTWFEQSP IGRKKQEEPF QEKIVVKREV KKRRQELEYQ 
201 VKAELESMPP RKSLRLQGLA QSTIIEIVSV QVPAQTPASP AKDKVLSTSE 
251 NEISQKVSDS SVKRRPIVTP GVTGLRNLGN TCYMNSVLQV LSHLLIFRQC 
301 FLKLDLNQWL AMTASEKTRS CKHPPVTDTV VYQMNECQEK DTGFVCSRQS 
351 SLSSGLSGGA SKGRKMELIQ PKEPTSQYIS LCHELHTLFQ VMWSGKWALV 
401 SPFAMLHSVW RLI PAFRGYA QQDAQEFLCE LLDKIQRELE TTGTSLPALI 
4 51 PTSQRKLIKQ VLNWNNIFH GQLLSQVTCL ACDNKSNTIE PFWDLSLEFP 
501 ERYQCSGKDI ASQPCLVTEM LAKFTETEAL EGKIYVCDQC NSKRRRFSSK 
551 PVVLTEAQKQ LMICHLPQVL RLHLKRFRWS GRNNREKIGV HVGFEEILNM 
601 EPYCCRETLK SLRPECFIYD LSAVVMHHGK GFGSGHYTAY CYNSEGGFWV 
651 HCNDSKLSMC TMDEVCKAQA YILFYTQRVT ENGHSKLLPP ELLLGSQHPN 
701 EDADTSSNEI LS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27dl, frame 2 

PIR:S57591 hypothetical protein YMR223w - yeast (Saccharomyces 
cerevisiae), N = 4, Score - 218, P = 8.4e-38 

SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 13) (DEUBIQUITINATING ENZYME 11) (KIAA0055) . , N => 2, Score =» 
300, P = 9.3e-31 

TREMBL:AF079565_1 gene: "Ubp41 M ; product: "ubiquitin-specif ic protease 
UBP41"; Mus musculus ubiquitin-specif ic protease UBP41 (Ubp41) mRNA, 
complete cds., N » 3, Score = 187, P = 8.7e-30 

PIR: 158376 hypothetical protein unp - mouse, N = 3, Score = 214, P « 
1.2e-28 
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>SWISSPROT:UBPB_HUMAN UBIQUITIN C ARBOX YL-TERMI NAL HYDROLASE 11 (EC 3.1.2.15) 
{UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) 
(DEUBIQUITINATING ENZYME 11) (KIAA0055) . 
Length - 1,118 

HSPs: 



Score *= 300 (45.0 bits), Expect « 9.3e-31, Sum P(2J » 9.3e-31 
Identities = 9S/301 (31%), Positives = 149/301 (49%) 



Query: 


381 


LC HE L HT L FQVMWS GKW AL V S P F AMLH S VWRL I P A FRG Y AQQD AQE FLCELLDKI QREL - 


439 






+ E + + +W+G++ +SP ++ ++ F GY+QQD+QE L L+D + +L 




Sbjct : 


826 


VAEEFGIIMKALWTGQYRYISPKDFKITIGKINDQFAGYSQQDSQELLLFLMDGLHEDLN 


885 


Query: 


440 


ET TGT S L PAL I PT S QRKL I KQVLN — VVN N I FHGQLL S QV T C LAC DNK S NT 


488 




E L + LN ++ +F GQ S V CL C KS T 




Sbjct: 


866 


KADNRKRYKEENNDHLDDFKAAEHAWQKHKQLNESIIVALFQGQFKSTVQCLTCHKKSRT 


945 


Query: 


489 


IEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQCNSKRRRFS 


548 






E F LSL +C+ +D CL + +K E + + + C C ++R 




Sbjct: 


946 


FEAFMYLSLPLASTSKCTLQD CL--RLFSK — EEKLTDNNRFYCSHCRARR 


992 


Query: 


549 


SKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFE-EILNMEPYCC-- 


605 






++ K++ I LP VL +HLKRF + GR ++K+ V F E L++ Y 




Sbjct: 


993 


DSLKKIEIWKLPPVLLVHLKRFSYDGRW-KQKLQTSVDFPLENLDLSQYVIGP 


1044 


Query: 


606 


RETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMCTMDEV 


665 




+ LK Y+L +V H+G G GHYTAYC N+ W +D ++S ++ V 




Sbjct: 


1045 


KNNLKK YNLFSVSNHYG-GLDGGHYTAYCKNAARQRWFKFDDHEVSDISVSSV 


1096 


Query: 


666 


CKAQAYILFYTQ RVTE 681 








+ AYILFYT RVT+ 




Sbjct: 


1097 


KSSAAYILFYTSLGPRVTD 1115 




Score 


= 126 


(18.9 bits), Expect = 9.3e-31, Sum P(2) = 9.3e-31 




Identities = 41/116 (35%), Positives = 63/116 (54%) 




Query: 


200 


QVKAELESMPPR--KSLRLQGLAQSTIIEIVSVQVPAQTPASPAKDKVLSTSENEISQKV 


257 






Q+ AE + P + +S + Q+ 1+ + P TP ++K + EIS ++ 




Sbjct: 


701 


QIPAERDREPSKLKRSYSSPDITQA — IQEEEKRKPTVTPTVNRENKPTCYPKAEI S-RL 


757 


Query: 


258 


SDSSVKR-RPIVT PGVTGLRNLGNTCYWNSVLQVLS HLLIF--RQCFLKLDLNQ 


308 






S S ++ P+ P +TGLRNLGNTCYMNS+LQ L HL + R C+ D+N+ 




Sbjct: 


758 


SASQIRNLNPVFGGSGPALTGLRNLGNTCYMNSILQCLCNAPHLADYFNRNCYQD-DINR 


816 


Score 


* 50 


(7.5 bits), Expect « 8.3e-23, Sum P(2) = 8.3e-23 




Identities - 29/106 (27%), Positives = 51/106 (48%) 




Query: 


173 


RKKQEEPFQEKIWKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQV 


232 






+ KQE+ +E+ +++ K R++E E+K+E+ + QA+++SQ 




Sbjct: 


475 


KNKQEKELRERQQEEQKEKLRKEEQEQKAKKKQEA-EENEITEKQQKAKEEMEKKESEQA 


533 


Query: 


233 


PAQ TPASPAKD KVLSTSENEIS-- QKVSDSSVKRRPIVTPGV 272 








+ T A K+ K S SE+E S +K + KR P TP + 




Sbjct: 


534 


KKEDKETSAKRGKEITGVKRQSKSEHETSDAKKSVEDRGKRCP--TPEI 580 





Score *> 42 (6.3 bits). Expect = 5.7e-22, Sum P(2) - 5.7e-22 
Identities - 13/58 (22%), Positives « 27/58 (46%) 



Query: 167 EQSPIGRKKQEEPFQEKIWKREVKKRRQELEY-QVKAELESMPPRKSLRLQGLAQST 223 

EQ +KKQE E +++ K+ ++ E Q K E + ++ + G+ + + 
Sbjct: 498 EQEQKAKKKQEAEENEITEKQQKAKEEMEKKESEQAKKEDKETSAKRGKEITGVKRQS 555 



Pedant information for DKFZphtes3_27dl, frame 2 



Report for DKFZphtes3_27dl .2 



[ LENGTH ] 712 

[MW] 81155.71 

(pi] 8.21 

[HOMOL] SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 

(UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) (DEUBIQUITINATING 

ENZYME 11) (KIAA0055) . 4e-32 

(FUNCAT) 06.13.01 cytoplasmic degradation [S. cerevisiae, YMR223w] 5e-33 

(FUN CAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 



palmitylation, farnesylation and processing) [S. cerevisiae, YMR223w] 5e-33 
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[FUNCAT] 06.13 proteolysis [S. cerevisiae, YBL067c] 3e-19 

[ FUNCAT ] 10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 2e-17 

[FUNCAT J 03.10 sporulation and germination [S. cerevisiae, YDR069c] 2e-17 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YDR069c] 2e-17 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDR069c) 2e-17 

[FUNCAT] 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 2e-17 

[FUNCAT) 04.05.01.04 transcriptional control [S. cerevisiae, YNL186w] 4e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YHLOlOc] 3e-12 

[BLOCKS] BL00970A Nuclear transition protein 2 proteins 

(BLOCKS] BL00972D 

[BLOCKS] BL00972C 

(BLOCKS] BL00972B 

(BLOCKS] BL00972A 

[EC] 3.1.2.15 Ubiquitin thiolesterase 5e-06 

[PIRKW] alternative splicing 2e-ll 

[PIRKW] thiolester hydrolase 5e-06 

[PIRKW] hydrolase le-14 

[SUPFAM] RING finger homology 7e-ll 

[SUPFAM] deubiquinating enzyme SSV7 5e-16 

[PROSITE] MYRISTYL 5 

[PROSITE] AMI DAT I ON 2 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 10 

(PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] UCH_2_2 1 

[PROSITE] PKC_PHOSPHO_SITE 17 

[PROSITE] A S N_GL YCOS Y L AT I ON 4 

[PROSITE] UCH_2_1 1 

[PFAM] Ubiquitin carboxyl- terminal hydrolases family 2 

[PFAM] Ubiquitin carboxyl -terminal hydrolases family 2 

[KW] Alpha_Beta 

[KW] LOW__COMPLEXITY 4 . 92 % 

SEQ MLAMDTCKHVGQLQLAQDHSSLNPQKWHCVDCNTTESIWACLSCSHVACGRYIEEHALKH 

SEG 

PRD ccccccccchhhhhhhhcccccccccceeecccceeeeeeeccccccccchhhhhhhhhh 

SEQ FQESSHPVALEVNEMYVFCYLCDDYVLNDNATGDLKLLRRTLSAIKSQNYHCTTRSGRFL 

SEG 

PRD hhhhccceeecccceeeeeeccccccccccccchhhhhhhhhhhhhcccceeeccccccc 

SEQ RSMGTGDDSYFLHDGAQSLLQSEDQLYTALWHRRRILMGKIFRTWFEQSPIGRKKQEEPF 

SEG 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhh 

SEQ QEKIWKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTI IEIVSVQVPAQTPASP 

SEG xxxxxxxxxxxxxxxx 

PRD hheeehhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccccccc 

SEQ AKDKVLSTSENEISQKVSDSSVKRRPIVTPGVTGLRNLGNTCYMNSVLQVLSHLLIFRQC 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ FLKLDLNQWLAMTASEKTRSCKHPPVTDTWYQMNECQEKDTGFVCSRQSSLSSGLSGGA 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhchhhhhhhhhhhhhhccccccceeehhhhhcccccccccccccccccccccccc 

SEQ SKGRKMELIQPKEPTSQYISLCHELHTLFQVMWSGKWALVSPFAMLHSVWRLI PAFRGYA 

SEG xxxxx 

PRD ccccceeecccccccchhhhhhhhhhhhhhhhhccceeeeccchhhhhhhhhhhccccch 

SEQ QQDAQEFLCELLDKIQRELETTGTSLPALIPTSQRKLIKQVLNVVNNIFHGQLLSQVTCL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhccccchhhhhhhhc 

SEQ ACDNKSNTIEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQC 

SEG 

PRD cccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhccceeecccc 

SEQ NSKRRRFSSKPWLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFEEILNM 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccccceeeeccccccc 

SEQ EPYCCRETLKSLRPECFIYDLSAWMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMC 

SEG 

PRD ccccccccccccccceeeeeeeeeeeecccccccccceeeeccccccceeeecccccccc 

SEQ TMDEVCKAQAYILFYTQRVTENGHSKLLPPELLLGSQHPNEDADTSSNEILS 

SEG 

PRD cchhhhhhhhhhhhhheeeecccccccccccccccccccccccccccccccc 
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Prosite for DKFZphtes3__27dl . 2 



PS00001 


33->37 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


90->94 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


484 


->488 


ASN~GL YCOS YLAT I ON 


PDOC00001 


PS00001 


653 


->657 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


545 


->549 


CAMP_PHOSPHO SITE 


PDOC00004 


PS00005 




6->9 


PKC PHOSPHO SITE 


PDOC00005 


PSOOO05 


113- 


->116 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


116 


->119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


213- 


->216 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


254 


->257 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


261 


->264 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


315 


->318 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


320 


->323 


PKC~ PHOSPHO~SITE 


PDOC00005 


PS00005 


394- 


->397 


PKC - PHOSPHO~SITE 


PDOC00005 


PS00005 


453- 


->456 


PKC~PHOS PHO - S ITE 


PDOC00005 


PS00005 


506* 


->509 


PKC~ PHOS PHO - S ITE 


PDOC00005 


PS00005 




->545 


PKC - PHOS PHO - S ITE 


PDOC00005 


PS00005 


OH O 


->551 


PKC~ PHOS PHO~ S ITE 


PDOC00005 


PS00005 


con, 




PKC~ PHOS PHO~ S ITE 


PDOC00005 


PS00005 


608' 


1 

"^D± X 


PKC~ PHOS PHO~ S ITE 


PDOC00005 


PS00005 


Oil 


.sri a 
~^Di- H 


PKC - PHOSPHO - SITE 


PDOC00005 


PS00005 


676- 


->679 


PKC~ PHOS PHO~"S ITE 


PDOC00005 


PS00006 


125- 


->129 


CK2~ PHOSPHO~ SITE 


PDOC00006 


PS00006 


164 


->168 


CK2 _ PHOS PHO~ S ITE 


PDOC00006 


PS00006 


223 


->227 


CK2~PHOSPHO — SITE 


PDOC00006 


PS00006 


247 


->251 


CK2~" PHOSPHO - SITE 


PDOC00006 


PS00006 


249- 


->253 


CK2~ PHOSPHO - SITE 


PDOC00006 


PS00006 


313 


->317 


CK2 _ PHOS PHO~ S I TE 


PDOC00006 


PS00006 


506 


->510 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


525- 


->529 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


661- 


->665 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


706- 


->710 


CK2 PHOSPHO SITE 


PDOC000O6 


PSO0007 


193- 


->200 


TYR PHOSPHO SITE 


PDOC000O7 


PS00007 


192- 


->200 


TYR PHOSPHO SITE 


PDOC000O7 


PS00008 


' 218- 


->224 


MYRISTYL 


PDOC00008 


PS00008 


355- 


->361 


MYRISTYL 


PDOC00008 


PS00008 


359- 


->365 


MYRISTYL 


PDOC00008 


PS00008 


471- 


->477 


MYRISTYL 


PDOC00008 


PS00008 


589- 


->595 


MYRISTYL 


PDOC00008 


PS00009 


171->175 


AMI DAT ION 


PDOC00009 


PS00009 


362- 


->366 


AMI DAT I ON 


PDOC00009 


PS00972 


274- 


->290 


UCH 2 1 


PDOC00750 


PS00973 


619- 


->638 


UCH 2 2 


PDOC00750 



Pfam for DKFZphtes3_27dl .2 



HMM_NAME Ubiquitin carboxyl- terminal hydrolases family 2 

HMM *GIqNlGNTCYMNSIIQCL* 

G++NLGNTCYMNS++Q+L 
Query 274 GLRNLGNTCYMNSVLQVL 291 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * YdLYgVICHYGntldyGHYWa YVKNenhHRWJcWYYFDDEtV* 

YDL +V+ H+G + ++GHY+AY++N + ++W+ +D++ 
Query 619 YDLSAWMHHGKGFGSGHYTAYCYNSE--GGFWVHCNDSKL 657 
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PCT/IBOO/01496 



group: transmembrane protein 

Summary DKFZphtes3_27k4 encodes a novel 490 amino acid protein with similarity to two 
hypothetical C.elegans proteins. 

The novel protein contains 10 transmembrane regions and a leucine zipper. It is a member of 
the new 10 trans-membrane domain containing protein family which is specific for multicellular 
eukariotes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 



strong similarity to C.elegans K07H8 .2/ZK185.2 
membrane regions: 10 

complete cDNA, complete cds potential start at Bp 109, few EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1901 bp 

Poly A stretch at pos. 1866, no polyadenylation signal found 



1 GTGATTTACC AGAAAAACCA AGAAGACAGG CACAAAAAAG CAAACGGCAT 
51 TTGGCAAGAT GGATTATCAA CTGCAGTACA GACTTTTAGT AATAGATCTG 
101 AGCAACACAT GGAGTATCAC AGTTTCTCAG AGCAGTCTTT TCATGCCAAT 
151 AATGGGCACG CATCATCAAG CTGCAGCCAA AAGTATGATG ACTATGCCAA 
201 TTATAATTAC TGTGATGGAA GGGAGACTTC AGAAACCACT GCCATGTTAC 
251 AAGATGAAGA TATATCTAGT GATGGTGATG AAGATGCTAT TGTAGAAGTG 
301 ACCCCAAAAT TACCAAAGGA ATCCAGTGGC ATCATGGCAT TGCAAATACT 
351 TGTGCCCTTT TTGCTAGCTG GTTTTGGAAC AGTTTCAGCT GGCATGGTAC 
401 TGGATATAGT ACAGCACTGG GAGGTGTTCA GAAAAGTTAC AGAAGTTTTC 
451 ATTTTAGTCC CTGCACTTCT TGGTCTCAAA GGGAACTTGG AAATGACATT 
501 GGCATCCAGA TTATCCACTG CAGTAAATAT TGGGAAGATG GATTCACCCA 
551 TTGAAAAGTG GAACCTAATA ATTGGCAACT TGGCTTTAAA GCAGGTTCAG 
601 GCAACAGTAG TGGGTTTTCT AGCAGCTGTG GCAGCAATTA TATTGGGCTG 
651 GATTCCAGAA GGAAAATATT ACCTTGATCA TTCCATACTT CTGTGCTCTA 
701 GCAGTGTGGC AACTGCCTTC ATTGCATCTC TTCTGCAGGG AATAATAATG 
751 GTTGGGGTTA TCGTTGGTTC AAAGAAGACT GGTATAAATC CTGATAATGT 
801 TGCTACACCC ATTGCTGCTA GTTTTGGCGA CCTTATAACT CTTGCCATAT 
851 TGGCTTGGAT AAGTCAGGGC TTATACTCCT GTCTTGAGAC C TAT TACT AC 
901 ATTTCTCCAT TAGTTGGTGT ATTTTTCTTG GCTCTAACCC CTATTTGGAT 
951 TATAATAGCT GCCAAACATC CAGCCACAAG AACAGTTCTC CACTCAGGCT 
1001 GGGAGCCTGT CATAACAGCT ATGGTTATAA GTAGCATTGG GGGCCTTATT 
1051 CTGGACACAA CTGTATCAGA CCCAAACTTG GTTGGGATTG TTGTTTACAC 
1101 GCCAGTTATT AATGGTATTG GTGGTAATTT GGTGGCCATT CAGGCTAGCA 
1151 GGATTTCTAC CTACCTCCAT TTACATAGCA TTCCAGGAGA ATTGCCTGAT 
1201 GAACCCAAAG GTTGTTACTA CCCATTTAGA ACTTTCTTTG GTCCAGGAGT 
1251 AAATAATAAG TCTGCTCAAG TTCTACTGCT TTTAGTGATT CCTGGACATT 
1301 TAATTTTCCT CT AC ACT ATT CATTTGATGA AAAGTGGTCA TACTTCTTTA 
1351 ACTATAATCT TCATAGTAGT GTATTTATTT GGCGCTGTGT TACAGGTATT 
1401 TACCTTGCTG TGGATTGCTG ACTGGATGGT CCATCACTTC TGGAGGAAAG 
14 51 GAAAGGACCC GGATAGTTTC TCCATCCCCT ACCTAACAGC ATTGGGTGAT 
1501 CTGCTCGGGA CAGCTCTGTT AGCCTTAAGT TTTCATTTTC TTTGGCTTAT 
1551 TGGAGATCGA GATGGAGATG TTGGAGACTA ATAAATTCTA CAAACTGCTC 
1601 TCAAGTTACC AAGGAAGAAA ATACACGACA ACCACTTATG GCTCTTTTTC 
1651 AAAACTCTTA AATCAGTAGT TTGACTTTTG CCAGGGTAAT CTTCAGTTGG 
1701 CCCTGATTCA ATTAAATGGC CTTAATTTTT TTTTAAGGAA TTTGTGTCAA 
1751 AACCAGAATG AAGAGTATTC GTGCTGCTTT TCATAGAATA AATGATAATT 
1801 TGACATAGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1851 AAAAAAAAAA AAGGGGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
1901 G 



BLAST Results 



No BLAST result 

Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 109 bp to 1578 bp; peptide length: 490 
Category: similarity to unknown protein 



1 MEYHSFSEQS 
51 DISSDGDEDA 
101 VQHWEVFRKV 
151 WNLIIGNLAL 
201 ATAFIASLLQ 
251 ISQGLYSCLE 
301 VITAMVISSI 
351 TYLHLHSIPG 
401 LYTIHLMKSG 
4 51 PDSFSIPYLT 



FHANNGHASS 
IVEVTPKLPK 
TEVFILVPAL 
KQVQATVVGF 
GIIMVGVIVG 
TYYYISPLVG 
GGLILDTTVS 
ELPDEPKGCY 
HTSLTIIFIV 
ALGDLLGTAL 



SCSQKYDDYA 
ESSGIMALQI 
LGLKGNLEMT 
LAAVAAIILG 
SKKTGINPDN 
VFFLALTPIW 
DPNLVGIVVY 
YPFRTFFGPG 
VYLFGAVLQV 
LALSFHFLWL 



NYNYCDGRET 
LVPFLLAGFG 
LASRLSTAVN 
WIPEGKYYLD 
VATPIAASFG 
IIIAAKHPAT 
TPVINGIGGN 
VNNKSAQVLL 
FTLLWIADWM 
IGDRDGDVGD 



SETTAMLQDE 
TVSAGMVLDI 
IGKMDSPIEK 
HSILLCSSSV 
DLITLAILAW 
RTVLHSGWEP 
LVAIQASRIS 
LLVIPGHLIF 
VHHFWRKGKD 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27k4 , frame 1 

TREMBL:AF036704_2 gene: "ZK185.2"; Caenorhabditis elegans cosmid 
ZK185. # N = 1, Score - 730, P = 3.1e-72 

TREMBL : AF04 7 659_9 gene: "K07H8.2 M ; Caenorhabditis elegans cosmid 
K07H8., N = 1, Score = 940, P = 1.7e-94 



*K07H8.2"; Caenorhabditis elegans cosmid K07H8. 



>TREMBL:AF047 659_9 gene: 
Length - 507 

HSPs: 

Score - 940 (141.0 bits), Expect - 1.7e-94, P « 1.7e-94 
Identities = 204/412 (49%), Positives - 271/412 (65%) 

LPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPALLGLKGNL 127 
+P ESS ++ Q+L PF +AG G V AG+VL IV W +F ++ E+ I L V P ALLGL KGNL 
IPAESSYVLFFQVLFPFAVAGLGMVFAGLVLSI VVTWPLFEEIPEILILVPALLGLKGNL 141 



EMTLASRLST N+G MDS ++ +++I NLAL QVQATW FLA+ A L +IP G + 



H X+C+SS+ATA ASL+ ++MV VIV S+K INPDNVATPI AAS GDL TL + 
DWAHGALMCASSLATACSASLVLSLLMVVVIVTSRKYNINPDNVATPIAASLGDLTTLTV 261 



Query: 


68 


Sbjct: 


82 


Query: 


128 


Sbjct: 


142 


Query: 


188 


Sbjct: 


202 


Query: 


248 


Sbjct: 


262 


Query: 


308 


Sbjct: 


322 


Query: 


368 


Sbjct: 


380 


Query: 


422 


Sbjct: 


440 



LA+ 



T +++ +V V FL L P WI IA ++ T+ L++GW PVI +M+I 



SS GG IL+T V 



321 



+ Y PV+NG+GGNL A+QASR+STY H G LP+E 

-SLSTYGPVLNGVGGNLAAVQASRLSTYFHKAGTVGVLPNEWT 379 



-RT FFGPGVNNKSAQVLLLLVI PGHLI FL YTI HLM- 
R FF +++SA+VLLLLV+PGH+ F + I L 



-KSGHTSLTIIFIVV 421 
K+ T +F + 



Y+ A++QV LL++ 



+V 



DPD+ I PYLTALGDLLGT LL + F 



Pedant information for DKFZphtes3_27k4, frame 1 



Report for DKFZphtes3_27k4 . 1 



[LENGTH] 
[MW] 



490 

53266.39 
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[pi] 


5.29 




[HOMOL] 


TREMBL: AF047659 9 


gene: "K07H8.2"; Caenorhabditis elegans cosmid K07H8. 4e 


[ PROS IT E] 


LEUCINE ZIPPER 1 




[PROSITE] 


MYRISTYL 7 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


7 


[ PROSITE J 


PROKAR LIPOPROTEIN 


2 


t PROSITE] 


TYR PHOSPHORITE 


1 


[PROSITE] 


PKC~PHOSPHO~SITE 


3 


[PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


TRANSMEMBRANE 10 




[KW] 


LOW COMPLEXITY 


3.06 % 



SEQ MEYHSFSEQSFHANNGHASSSCSQKYDDYANYNYCDGRETSETTAMLQDEDISSDGDEDA 

SEG 

PRD cccccccceeeccccccccccccccccccceeecccccccchhhhhhhhcccccccccee 

MEM 

SEQ IVEVTPKLPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPAL 

SEG 

PRD eeeeeccccccchhhhhhhhhhhhhhhcccchhhhhhhhhcchhhhhcccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . MMMMMMMMMMMMMMM 

SEQ LGLKGNLEMTLASRLSTAVNIGKMDSPIEKWNLIIGNLALKQVQATVVGFLAAVAAIILG 

SEG 

PRD ccccchhhhhhhhhhhhhhccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMM MMMMMMMMMMMMMMMMM 

SEQ WIPEGKYYLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFG 

SEG 

PRD hcccceeecccceeehhhhhhhhhhhhhhhhhhhhheeeecccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ DLITLAILAWISQGLYSCLETYYYISPLVGVFFLALTPIWIIIAAKHPATRTVLHSGWEP 

SEG 

PRD cchhhhhhhhhhhhhhhhcceeeeehhhhhhhhhhchhhhhhhhccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMM .... MMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ VITAMVISSIGGLILDTTVSDPNLVGIVVYTPVINGIGGNLVAIQASRISTYLHLHSIPG 

SEG 

PRD hcchhhhhhcceeeeccccccccceeeeeeceeeecccccceeeeehhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM _ 

SEQ ELPDEPKGCYYPFRTFFGPGVNNKSAQVLLLLVIPGHLIFLYTIHLMKSGHTSLTIIFIV 

SEG 

PRD cccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhcccccccceeeehhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . MMMMMMM 

SEQ VYLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSFHFLWL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccceeeeeecchhhhhhhhhhhhheeee 

MEM MMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IGDRDGDVGD 

SEG 

PRD eecccccccc 

MEM MM 



Prosite for DKFZphtes3_27k4 . 1 



PS00001 


383->387 


PS00004 


108->112 


PS00005 


23->26 


PS00005 


65->68 


PS00005 


221->224 


PS00006 


5->9 


PS00006 


54->58 


PS00006 


146->150 


PS00006 


238->242 


PS00006 


257->261 


PS00006 


296->300 


PS00006 


318->322 


PS00007 


25->33 


PS00008 


90->96 


PS00008 


122->128 


PS00008 


216->222 


PS00008 


220->226 



asn_glycosylation 

camp_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc phospho_site 

ck2~phospho_site 

ck2_phospho_site 

ck2_phospho_site 

ck2_phos pho_s i te 

ck2 phospho_site 

ck23phospho_site 

ck2_phospho_site 

tyr_phospho_site 

myristyl 

myristyl 

myristyl 

myristyl 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 254->260 MYRISTYL PDOC00008 

PS0000B 336->342 MYRISTYL PDOC00008 

PS00008 339->345 MYRISTYL PDOC00008 

PS00013 12->23 PROKAR_LIPOPROTEIN PDOC00013 

PS00013 248->259 PROKAR LIPOPROTEIN PDOC00013 

PS00029 459->481 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_27k4 . 1) 
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DKFZphtes3_27ol4 



group: testes derived 

DKFZphtes3_27ol4 encodes a novel 358 amino acid protein with similarity to C. elegans cosmid 
C55A6. 

The new protein contains a C3HC4 zinc finger (RING finger) signature. The ring finger 
structure binds two atoms of zinc, and is involved in mediating protein-protein interactions. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



similarity to C. elegans C55A6.1 
complete cDNA, complete cds, EST hits 
Sequenced by GBF 
Locus: /map"** 6" 
Insert length: 2158 bp 

Poly A stretch at pos . 2137, polyadenylation signal at pos. 2120 



1 CCGAGGCCAG AGAGAAAAGA CTGCGAGGTG GCCGCAGCTG TGGCCGGAGA 
51 GCACAAAGAA TGAACCAGCA GTGGAAGAGA AAATACTGTA AGCTGGCTGA 
101 CTGCTGGTGA AGAAAATGCT TTATTTTTGT GGCAGGCATC TGTGGGATCT 
151 GTAATAGAAA TATATTGGAG TAATTCAAGA TTCTGTGGTT GGCCCTTTTG 
201 ACTGCTCTCT CTACAGGTTT AATTTGGGCA TTTACTCATT TTCATGGCTC 
251 CAAGGACCAT GTATGTGTTG GGGATCTTCA ATATTCATGT TATTTTCTCC 
301 TTTGGTCTTA TATGATTGTT ACCTTTATGA AGCTTTAGTG ATTACAAAGC 
351 ACTTTTTTTG TCCATTTTTA CCTGAGCTTT GTAAACTCTG ATTTGCAGGA 
401 TGGCTGGCTG TGGTGAAATT GATCATTCAA TAAACATGCT TCCTACAAAC 
451 AGGAAAGCGA ACGAGTCCTG TTCTAATACT GCACCTTCTT TAACCGTCCC 
501 TGAATGTGCC ATTTGTCTGC AAACATGTGT TCATCCAGTC AGTCTGCCCT 
551 GTAAGCACGT TTTCTGCTAT CTATGTGTAA AAGGAGCTTC ATGGCTTGGA 
601 AAGCGGTGTG CTCTTTGTCG ACAAGAAATT CCCGAGGATT TCCTTGACAA 
651 GCCAACCTTG TTGTCACCAG AAGAACTCAA GGCAGCAAGT AGAGGAAATG 
701 GTGAATATGC ATGGTATTAT GAAGGAAGAA ATGGGTGGTG GCAGTACGAT 
751 GAGCGCACTA GTAGAGAGCT GGAAGATGCT TTTTCCAAAG GTAAAAAGAA 
801 CACTGAAATG TTAATTGCTG GCTTTCTGTA TGTCGCTGAT CTTGAAAACA 
851 TGGTTCAATA TAGGAGAAAT GAACATGGAC GTCGCAGGAA GATTAAGCGA 
901 GATATAATAG AT AT AC C AAA GAAGGGAGTA GCTGGACTTA GGCTAGACTG 
951 TGATGCTAAT ACCGTAAACC TAGCAAGAGA GAGCTCTGCT GACGGAGCGG 
1001 ACAGTGTATC AGCACAGAGT GGAGCTTCTG TTCAGCCCCT AGTGTCTTCT 
1051 GTAAGGCCCC TAACATCAGT AGATGGTCAG TTAACAAGCC CTGCAACACC 
1101 ATCCCCTGAT GCAAGCACTT CTCTGGAAGA CTCTTTTGCT CATTTACAAC 
1151 TCAGTGGAGA CAACACAGCT GAAAGGAGTC ATAGGGGAGA AGGAGAAGAA 
1201 GATCATGAAT CACCATCTTC AGGCAGGGTA CCAGCACCAG ACACCTCCAT 
1251 TGAAGAAACT GAATCAGATG CCAGTAGTGA TAGTGAGGAT GTATCTGCAG 
1301 TTGTTGCACA GCACTCCTTG ACCCAACAGA GACTTTTGGT TTCTAATGCA 
1351 AACCAGACAG TACCCGATCG ATCAGATCGA TCGGGAACTG ATCGATCAGT 
1401 AGCAGGGGGT GGAACAGTGA GTGTCAGTGT CAGATCTAGA AGGCCTGATG 
1451 GACAGTGCAC AGTAACTGAA GTTTAAATAA AAATGTCTTC AGCTCCATGC 
1501 TCAAGGTTGA AAGGGTTACC TGTAAATTTC TGCCCACATA ACATTATACT 
1551 CATCCCTAGT AGTGCATTTT GGGAGTTGGG GTGGGAAGGG GTATGGGAAG 
1601 GATAGACTCA TAATTAAAAT GTCTAACATG TCTCTGTTGA GAAATTTATT 
1651 TAATGTAAGG AACTTGGGTG TTAATAGTTG AGAGCTGTTT AGTAATAACC 
1701 CAGTTTTCTT GAGGTCTGTT TACTTTATAC TTTTTAAAAA CTTCTGTAGT 
1751 TCTTTTGGCC AGTGTGTTTG TATTATCTGT GCATTAATGG TCCTCATCTG 
1801 ACTCCTGCAT TGTGTCTTAT TTTTCTGCAT GGATTGGCAT AAGACCATTA 
1851 CTAAAATTTG GCACCTGTGA GATGTTTGAT ATTATGAACA GGAAACATAA 
1901 TTTAATGTAT GAATAGATGT GAATTTGGGA TTTCAAAATA GATGAATAAC 
1951 AACTATTTTA TAGTAAAGTT ATTGAAATGG AAATGAAAAC AGCCAGTAAC 
2001 TTATGTTTCA GAATGTTTGT AACACACTTC ATGGTGTTCC CATAGGCTTT 
2051 GCTGTCTAGT CTTATAGTTT GAGGTTTTTT TGGTCTGCAT TTTTCTTTTT 
2101 GATTACAAAA TTTATAATTT AATAAATACT AGAGTTTATC AAAAAAAAAA 
2151 AAAAAAAG 



BLAST Results 



Entry HSG117 from database EMBL: 
human STS SHGC-36270. 
Score = 1148, P - 8.9e-45, identities « 240/250 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 400 bp to 1473 bp; peptide length: 358 
Category: similarity to unknown protein 
Prosite motifs: ZINC FINGER C3HC4 (51-61) 



1 MAGCGEIDHS INMLPTNRKA NESCSNTAPS LTVPECAICL QTCVHPVSLP 
51 CKHVFCYLCV KGASWLGKRC ALCRQEIPED FLDKPTLLSP EELKAASRGN 
101 GEYAWYYEGR NGWWQYDERT SRELEDAFSK GKKNTEMLIA GFLYVADLEN 
151 MVQYRRNEHG RRRKIKRDII DIPKKGVAGL RLDCDANTVN LARESSADGA 
201 DSVSAQSGAS VQPLVSSVRP LTSVDGQLTS PATPSPDAST SLEDSFAHLQ 
251 LSGDNTAERS HRGEGEEDHE SPSSGRVPAP DTSIEETESD ASSDSEDVSA 
301 VVAQHSLTQQ RLLVSNANQT VPDRSDRSGT DRSVAGGGTV SVSVRSRRPD 
351 GQCTVTEV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27ol4, frame 1 

TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6, 
N « 2, Score = 165, P = 4.2e-15 

SWISSPROT:YWZ6_CAEEL HYPOTHETICAL 39.3 KD PROTEIN C02B8.6 IN CHROMOSOME 
X., N = 2, Score = 136, P ~ 3.1e-ll 



>TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 
Length - 484 

HSPs: 

Score = 165 {24.8 bits), Expect = 4.2e-15, Sum P(2) « 4.2e-15 
Identities » 42/106 (39%), Positives = 61/106 (57%) 



Query: 75 QEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRN-GWWQYDERTSRELEDAFSKGKK 133 

Q +P LD ++ PEE K Y W Y G+N GWW+++ R RE+E+A++ GK 

Sbjct: 93 QNVPALDLDA-SICDPEERK Y-WIYSGKNQGWWRFEPRNEREIEEAYNAGKC 142 

Query: 134 NTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKR DI I D- I PKKGVAGL 180 

+ E++I G YV D +QY R + R +KR DDI KG+AG+ 
Sbjct: 143 HCEWICGRPYVIDFHQFLQYPRGVPNQARHVKRVSADDFDGIGVKGLAGI 193 

Score - 96 (14.4 bits), Expect « 4.2e-15, Sum P<2> = 4.2e-15 
Identities = 19/54 (35%), Positives - 30/54 (55%) 

Query: 35 ECAICLQTCVHPVSLP-CKHVFCYLCVKGASW— LGKRCALCRQEIPEDFLDKPT 86 

EC IC + P ++P C H FC++C+KG +G C +CR I + +P+ 

Sbjct: 11 EC P I CQC KM I V PT T I P AC GH K FC FI C LKGV YMN DMGG - CPMCRGPIDSNI F AQPS 64 

Pedant information for DKFZphtes3_27ol4, frame 1 



Report for DKFZphtes3_27ol4 . 1 

[LENGTH] 358 

[MW] 38818.90 

[pi] 5.17 

[HOMOL] TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 2e-12 

[ FUNCAT ] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YCR066w] 3e-04 
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[FUNCAT] 

palmitylation, 

[FUNCAT] 

(FUNCAT J 

{BLOCKS] 

[PROSITE] 

I PROS I TE] 

(PROSITE] 

[PROSITE] 

(PROSITE) 

(PROSITE] 

[PROSITE] 

[PROSITE) 

[PFAM] 

[KW) 

[KW] 

[KW] 



06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) [S. cerevisiae, YCR066w] 3e-04 

06.10 assembly of protein complexes [S. cerevisiae, YDR265w] 4e-04 

30.19 peroxisomal organization [S. cerevisiae, YDR265w] 4e-04 

BL00518 Zinc finger, C3HC4 type, proteins 

MYRISTYL 2 

AMI DAT I ON 3 

CAMP_PHOSPHO SITE 1 

CK2_PHOSPHO_SITE 12 

TYR_PHOSPHO SITE 1 

Z I NC_FI NGER~C 3 HC 4 1 

PKC_PHOSPHO SITE 9 

ASN_GLYCOSYLATION 2 

Zinc finger, C3HC4 type (RING finger) 

Irregular 

3D 

LOW COMPLEXITY 19.83 % 



SEQ MAGCGEI DHSINMLPTNRKANESCSNTAPSLTVPECAICLQTCVHPVSLPCKHVFCYLCV 

SEG 

lrmd- TTTTTEETTTEEEETTTEEEEHHHH 

SEQ KGASWLGKRCALCRQEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRNGWWQYDERT 

SEG 

lrmd- HHHHHHCCBTTTTTCBCGGG-CBCC 

SEQ S RE L E DAFS KG KKNT EML I AG FL YV A DL EN M VQY RRN E HGRRRK I K RDI I D I PKKGV AG L 

SEG xxxxxxxxxxxxxxx 

lrmd- 

SEQ RLDCDANTVNLARESSADGADSVSAQSGASVQPLVSSVRPLTSVDGQLTSPATPSPDAST 

SEG xxxxxxxxxxxx 

lrmd- 

SEQ SLEDSFAHLQLSGDNTAERSHRGEGEEDHESPSSGRVPAPDTSIEETESDASSDSEDVSA 

SEG X xxxxxxxxxxxxxxxxxxxx 

lrmd- 

SEQ VVAQHSLTQQRLLVSNANQTVPDRSDRSGTDRSVAGGGTVSVSVRSRRPDGQCTVTEV 

SEG XXX xxxxxxxxxxxxxxxxxxxx 

lrmd- 
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PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS000O5 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 
PS00518 



21->25 
318->322 
132->136 

16->19 
120->123 
217->220 
260->263 
274->277 
325->328 
330->333 
343->346 
346->349 

32->36 

89->93 
120->124 
195->199 
222->226 
240->244 
282->286 
287->291 
293->297 
320->324 

328- >332 
354->358 

98->107 

329- >335 
337->343 

66->70 
130->134 
159->163 

51->61 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPH0_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

AMI DAT ION 

AMI DAT ION 

ZINC FINGER C3HC4 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 
PDOC0044 9 
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Pfam for DKFZphtes3_27ol4 . 1 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePinMlPCgHsFCypCIrrW CPmC* 

C+IC L + P++LPC+H+FCY C++ C +C 

Query 36 CAIC LQT CVHPVSLPCKHVFCYLCVKGASWLGKRCALC 73 
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DKFZphtes3_28dl4 



group: testes derived 

DKFZphtes3_28dl4 encodes a novel 97 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1279 bp 

Poly A stretch at pos . 1232, no polyadenylation signal found 



1 GGAGCTCAGA AGTTGGGCAA AGGTCACAGC AGACTTCCTG AAAAGCAGAC 

51 ACTGAGGAAC ACAGTGGAGA GCGGGAGTTC ACAGCGACGC AGCTGAGGAC 

101 GACGCAGGAC CTCTCCCAAA GGTGCTGCAG CTCCAGCACC AGGGGCCAGG 

151 GCTGCGGCGA CAGCAGCTCA GCAACCCTTG CTGTGCTCAA GTTCTTGGGG 

201 ATTCAGAGCT AAGTTCAAAA TTTAGAAACA GTGCCTTAAA GACGGGCAAG 

251 AAAACCCGGT GTGGGAGTCT GCTCATCTAT GGTTTGTTAC TGCTCTCGCT 

301 TTGATATTCT TAAATTCCTA GGTACCAATG AAAAAGCCAA GTGAACGTGG 

351 CAGAGTGAGG AGGAGACAGG AGCGTGTGCA CCTTCCATCT GTGAGAGGCA 

401 CACTTCAGTC TGGGTTCAAG ATGCAGAATG GTGCCTACAG CAAAAAAAAA 

451 AAAAACACCC TCCTCCCTTC TTTACCATTT GAATGGACAT TTTCCTTACC 

501 TGTGATCCCA ACAGAAACAG ATCCAGACCT ATCATGTGAA GTCCACGTTC 

551 CAGGATCAGA AGTAACCAGT TTATGGACTG AGCTTACACG GGAAAGTCTA 

601 CCCCCGACTC CTTCTGGATA GTAACATACA CAGCTGCATA AAAACGTCTC 

651 CAAGGGGACA TACGATGCAT TTGCTTGGTG TCCCAGCCAA GCTCCCCACC 

701 GGCGACCTCA CTGTTCCTTA GAGCTCGAGA GCTCGTCTCC TATCAATCAG 

751 AGAACCCCAT CAGCTGTGAC CAACAGAGCT GGAGCCCTCT GTGGAGGGAG 

801 CTGACCCCAC ACACAGGACA GAGCAGAATC CTGATTATTT TACAAACTGC 

851 AAACCTTCTG AGTAAGAAGA CAAAAATATA CATTCCAAGG TATCTGTAAA 

901 GTGCTTGGAA GATGCAGACA GCTGCACCGA GGGGCTCTGA TCCATCCACA 

951 CGCTGCGCTT TGCTGCGGTC ACACACACGG TCTCAGTCAC GTGATGGTTT 

1001 TGCTTTTATT TCTTAAACGG CTGAGTGATA ATCCAGCTAG TGTGCAGTCA 

1051 TTTCATACCT TTCAATGGGC GTCACCGCAG TGACGCTGCC CCAGCCCCAT 

1101 GCTGAGGGCC GACACAATTC ACGGAACAGA TT CATC AT AT TTGGTCTTTA 

1151 TGTAAATAAT AAATGTTTTA AAATTGCCTA AATATAAAAA AAAAAAAAAA 

1201 AAAAAAAAAA AAAAAAAAAA AAAGGGCGGC CGAAAAAAAA AAAAAAAAAA 

1251 AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 328 bp to 618 bp? peptide length: 97 
Category: putative protein 



1 MKKPSERGRV RRRQERVHLP SVRGTLQSGF KMQNGAYSKK KKNTLLPSLP 
51 FEWTFSLPVI PTETDPDLSC EVHVPGSEVT SLWTELTRES LPPTPSG 

BLAST P hits 

No BLAST P hits available 



759 



WO 01/12659 



Alert BLASTP hits for DKFZphtes3_28dl4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_28dl4, frame 1 



Report for DKFZphtes3_28dl4 . 1 



[LENGTH] 97 

[MW] 10945.56 

[pi] 9.80 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

(PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPH0_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 12.37 % 

SEQ MKKPSERGRVRRRQERVHLPSVRGTLQSGFKMQNGAYSKKKKNTLLPSLPFEWTFSLPVI 
SEG xxxxxxxxxxxx 



PRD cccccchhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccc 

SEQ PTETDPDLSCEVHVPGSEVTSLWTELTRESLPPTPSG 

SEG 

PRD ccccccccceeeecccccchhhhhhhhhhcccccccc 



Prosite for DKFZphtes3_28dl4 . 1 



PS00004 


2->6 


PS00004 


41- 


->45 


PS00005 


5->8 


PS00005 


21- 


->24 


PS00005 


38- 


->41 


PS00006 


62- 


->66 


PS00006 


64- 


->68 


PS00008 


24- 


->30 


PS00008 


7 6- 


->82 



CAMP - PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_28dl4 . 1) 
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DKFZphtes3_2all 



group: testes derived 

DKFZphtes3_2all encodes a novel 1048 amino acid protein with very weak similarity to mucins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to mucin 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4082 bp 

Poly A stretch at pos. 4060, polyadenylation signal at pos . 4034 



1 GAGGACTGCG AGCACAGCGG CGGCCGGGTG GCGGGGGTGA GTGGGGCCAG 
51 CGGGGCTGGA CAGCAGCGGG CCCCGGGCGC CGCCGCCGCG ATCCCTCCCC 
101 GCGCCCGCCG AGCACATCGC CGCCGCCGAG ATGGGCCCTC CGCGGCACCC 
151 CCAGGCCGGC GAGATAGAAG CGGGCGGTGC GGGCGGCGGG CGGCGGCTAC 
201 AGGTGGAAAT GAGTTCTCAA CAGTTTCCTC GGTTAGGAGC CCCTTCTACC 
251 GGGCTGAGCC AGGCCCCTTC TCAGATTGCA AACAGTGGTT CTGCTGGATT 
301 GATAAACCCA GCTGCTACAG TCAATGATGA ATCTGGTCGA GATTCTGAAG 
351 TCAGTGCCAG GGAGCACATG AGTTCCAGCA GCTCCCTCCA GTCCCGGGAG 
401 GAGAAGCAAG AGCCTGTTGT GGTAAGGCCC TATCCACAGG TGCAGATGTT 
451 GTCGACACAC CATGCTGTCG CATCAGCCAC ACCTGTTGCA GTGACAGCCC 
501 CGCCAGCACA CCTGACGCCA GCAGTGCCAC TTTCATTTTC GGAGGGACTT 
551 ATGAAGCCGC CCCCGAAGCC CACCATGCCT AGCCGTCCCA TTGCTCCTGC 
601 TCCACCTTCT ACCCTGTCAC TTCCCCCCAA GGTTCCAGGG CAGGTTACCG 
651 TTACCATGGA GAGTAGCATC CCTCAAGCTT CAGCCATTCC TGTGGCAACA 
701 ATCAGTGGAC AACAGGGCCA TCCCAGTAAC CTGCATCACA TCATGACTAC 
751 AAATGTGCAA ATGTCTATCA TCCGCAGCAA TGCTCCTGGG CCCCCTCTTC 
801 ACATTGGAGC TTCTCATTTA CCTCGAGGTG CAGCTGCTGC TGCTGTGATG 
851 TCCAGTTCTA AAGTAACCAC AGTCCTGAGG CCGACCTCAC AGCTGCCAAA 
901 TGCTGCTACT GCTCAGCCAG CAGTACAGCA CATCATTCAC CAACCAATCC 
951 AGTCTCGGCC ACCTGTGACC ACCTCCAATG CCATCCCTCC TGCTGTGGTA 
1001 GCAACTGTCT CAGCCACCAG AGCTCAGTCT CCAGTCATCA CTACGACAGC 
1051 GGCGCATGCT ACTGATTCAG CACTTAGTAG GCCAACCTTG TCTATCCAGC 
1101 ATCCTCCATC TGCAGCAATC AGTATTCAGC GTCCTGCCCA GTCACGAGAT 
1151 GTCACAACAA GAATCACACT ACCATCTCAC CCTGCATTAG GGACGCCAAA 
1201 ACAGCAGCTT CATACAATGG CTCAGAAAAC AATCTTCAGT ACTGGCACGC 
1251 CAGTGGCTGC AGCCACAGTA GCACCTATTT TGGCAACCAA CACCATTCCT 
1301 TCAGCGACCA CAGCTGGATC TGTGTCACAC ACGCAAGCTC CCACAAGTAC 
1351 CATTGTTACC ATGACAGTAC CCTCCCATTC CTCCCATGCT ACTGCTGTGA 
1401 CCACCTCAAA CATCCCAGTC GCCAAGGTGG TGCCCCAGCA GATCACGCAC 
1451 ACTTCTCCTC GGATCCAGCC AGACTACCCT GCCGAGAGGA GTAGCCTGAT 
1501 TCCCATCTCC GGACATCGGG CCTCTCCCAA TCCTGTGGCC ATGGAAACCC 
1551 GAAGTGACAA CAGACCGTCT GTTCCCGTTC AGTTCCAATA TTTTTTGCCA 
1601 ACTTACCCCC CTTCTGCATA CCCACTGGCG GCACATACCT ACACCCCAAT 
1651 CACCAGTTCC GTGTCCACTA TCCGACAGTA TCCAGTTTCA GCTCAGGCTC 
1701 CAAACTCTGC CATCACAGCT CAGACTGGTG TTGGGGTAGC GTCTACCGTC 
1751 CACCTAAACC CCATGCAGTT GATGACAGTG GATGCATCGC ATGCTCGACA 
1801 TATTCAAGGG ATCCAGCCAG CACCCATCAG TACCCAGGGT ATCCAGCCGG 
1851 CCCCCATTGG GACCCCAGGG ATACAGCCTG CACCACTTGG CACACAGGGA 
1901 ATTCACTCAG CAACCCCAAT CAACACACAA GGGCTTCAGC CTGCACCTAT 
1951 GGGTACTCAG CAGCCTCAGC CTGAAGGAAA GACTTCAGCA GTGGTGTTGG 
2001 CAGATGGAGC CACAATTGTG GCCAACCCTA TTAGCAATCC ATTCAGTGCT 
2051 GCTCCAGCAG CAACAACCGT GGTGCAGACC CACAGCCAGA GTGCTAGCAC 
2101 CAACGCTCCC GCCCAGGGCT CATCGCCACG GCCAAGCATA CTCCGGAAGA 
2151 AACCTGCCAC AGATGGTGCC AAACCCAAGT CTGAAATCCA CGTGTCTATG 
2201 GCCACTCCGG TCACTGTGTC CATGGAGACT GTATCCAATC AAAATAATGA 
2251 TCAGCCTACC ATTGCCGTCC CTCCAACTGC CCAGCAGCCC CCACCGACCA 
2301 TTCCAACTAT GATTGCAGCA GCCAGTCCCC CGTCACAACC AGCCGTTGCC 
2351 CTTTCAACCA TTCCTGGAGC GGTCCCCATC ACTCCACCCA TCACCACCAT 
2401 TGCAGCTGCA CCACCTCCAT CAGTCACTGT GGGTGGCAGT CTTTCCTCCG 
2451 TCTTGGGCCC TCCCGTTCCT GAAATTAAAG TGAAAGAAGA AGTAGAACCA 
2501 ATGGATATCA TGAGGCCAGT TTCTGCAGTT CCTCCACTGG CTACCAACAC 
2551 TGTGTCTCCA TCTCTTGCAT TGCTGGCAAA CAACTTGTCC ATGCCTACAA 
2601 GTGACCTACC ACCTGGTGCC TCCCCAAGGA AAAAGCCTCG AAAGCAACAG 
2651 CATGTGATCT CAACAGAAGA AGGTGACATG ATGGAGACAA ACAGCACTGA 
2701 TGATGAGAAG TCCACTGCCA AGAGTCTTCT GGTGAAGGCT GAGAAGCGCA 
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2751 AGTCTCCTCC CAAGGAGTAT ATTGATGAGG AAGGTGTGAG ATATGTCCCA 
2801 GTGCGTCCAA GACCCCCCAT TACTTTGCTT CGTCACTATC GGAACCCCTG 
2851 GAAAGCTGCT TACCACCACT TTCAGAGGTA CAGTGACGTC CGGGTCAAAG 
2901 AGGAGAAGAA AGCTATGCTG CAGGAAATAG CTAATCAGAA AGGAGTATCC 
2951 TGTCGTGCTC AAGGCTGGAA AGTCCACCTC TGTGCTGCCC AGTTACTACA 
3001 GCTGACGAAT CTAGAACATG ATGTCTATGA AAGACTTACT AACCTGCAGG 
3051 AAGGGATTAT CCCAAAGAAA AAAGCAGCAA CAGATGATGA TCTCCACCGA 
3101 ATAAACGAAC TGATACAGGG AAATATGCAG AGGTGTAAAC TTGTGATGGA 
3151 TCAAATCAGT GAAGCCAGAG ACTCCATGCT TAAGGTTTTA GATCATAAAG 
3201 ACCGTGTCCT GAAGCTGCTT AACAAGAACG GGACTGTCAA AAAAGTGTCC 
3251 AAATTGAAGC GAAAGGAAAA AGTCTAGACC CAGAACAATC AGGAGATTGG 
3301 AAGCAAATTT ATGAAGAATG ATGGTGGGGG TGGGGGGAGG GTTTTGGTTT 
3351 TTTCCAAAGT GGAACATTGA AATAAAGGAA GTGTTCCTTA GTTCCCGTGT 
3401 GAAAGCAGAG GAACCCATGA CATCCAAGGG CGTGAAAGGA TCAGAGCTGA 
3451 CTGGACATAG TGAGCTGCCT TCTTGCGTTC GGGTGCACCC CTGTTAAACC 
3501 TGATCTGTGT CATAAGTGAC TCCGGATGCA TCAGTGTCCA CCAGTTGGAA 
3551 GCAATGACAA GGATGGCTGG CTGGTGTTTT TCAGCCTTCC GGTTTATAGA 
3601 CTGTATTTAT CTAGTGGATT CCTGCAGGCC CCATACTGAG CCTGGACTGA 
3651 AAGTATCCAC TCGGACCATC TGTTATCTCT CTACACTGAA AATAAAACCT 
3701 CTTCCACCCA CCCCATTCGG TTCTTCTGCC TGACCTTCAA ATGCCCATGT 
3751 TGGCCTTTTA CAGCAGTGCC ACGGCACCAA GCGAGCTGCC ACATCTCACA 
3801 CTCTAAAGGG TTTGAACTAT TAGTTCTTGT CATTTTTTAA AAAAAACCAT 
3851 TCCCAAGTGA AATTGTTATA TCGTCTGTCT TGCGTGTGTC AGAACTGGGT 
3901 TTTTGTGGAG GTTCAGAGCA GGCAACACCA TAAGTTGCTC TCAGATCCTT 
3951 GTTCTGAAGT ACATTCTTGG TTATCTGTAC TTCTGTAGCT GGTGTGATGC 
4001 TGTTAATTGT ATGTACCACA CATCTCCAGA CGTTAATAAA GGACTCAAAG 
4051 AGGTTTTTGT AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 131 bp to 3274 bp; peptide length: 1048 
Category: similarity to known protein 



1 MGPPRHPQAG EIEAGGAGGG RRLQVEMSSQ QFPRLGAPST GLSQAPSQIA 
51 NSGSAGLINP AATVNDESGR DSEVSAREHM SSSSSLQSRE EKQEPWVRP 
101 YPQVQMLSTH HAVASATPVA VTAPPAHLTP AVPLSFSEGL MKPPPKPTMP 
151 SRPIAPAPPS TLSLPPKVPG QVTVTMESSI PQASAIPVAT ISGQQGHPSN 
201 LHHIMTTNVQ MSIIRSNAPG PPLHIGASHL PRGAAAAAVM SSSKVTTVLR 
251 PTSQLPNAAT AQPAVQHIIH QPIQSRPPVT TSNAIPPAVV ATVSATRAQS 
301 PVITTTAAHA TDSALSRPTL SIQHPPSAAI SIQRPAQSRD VTTRITLPSH 
351 PALGTPKQQL HTMAQKTIFS TGTPVAAATV APILATNTIP SATTAGSVSH 
401 TQAPTSTIVT MTVPSHSSHA TAVTTSNIPV AKVVPQQITH TSPRIQPDYP 
451 AERSSLIPIS GHRASPNPVA METRSDNRPS VPVQFQYFLP TYPPSAYPLA 
501 AHTYTPITSS VSTIRQYPVS AQAPNSAITA QTGVGVASTV HLNPMQLMTV 
551 DASHARHIQG IQPAPISTQG IQPAPIGTPG IQPAPLGTQG IHSATPINTQ 
601 GLQPAPMGTQ QPQPEGKTSA VVLADGATIV ANPISNPFSA APAATTWQT 
651 HSQSASTNAP AQGSSPRPSI LRKKPATDGA KPKSEIHVSM ATPVTVSMET 
701 VSNQNNDQPT IAVPPTAQQP PPTIPTMIAA ASPPSQPAVA LSTIPGAVPI 
751 TPPITTIAAA PPPSVTVGGS LSSVLGPPVP EIKVKEEVEP MDIMRPVSAV 
801 PPLATNTVSP SLALLANNLS MPTSDLPPGA SPRKKPRKQQ HVISTEEGDM 
851 METNSTDDEK STAKSLLVKA EKRKSPPKEY I DEEGVRYVP VRPRPPITLL 
901 RHYRNPWKAA YHHFQRYSDV RVKEEKKAML QEIANQKGVS CRAQGWKVHL 
951 CAAQLLQLTN LEHDVYERLT NLQEGI I PKK KAATDDDLHR INELIQGNMQ 
1001 RCKLVMDQIS EARDSMLKVL DHKDRVLKLL NKNGTVKKVS KLKRKEKV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2all, frame 2 

SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2)., N - 1, 
Score - 334, P » 2.4e-25 



762 



WO 01/12659 



PCT/BB00/01496 



PIR:A43932 mucin 2 precursor, intestinal - human (fragments), N ■ 1, 
Score « 321, P - 3.2e-24 

TREMBL:D88440_1 product: "high molecular mass nuclear antigen"; Gallus 
gallus mRNA for high molecular mass nuclear antigen, partial cds . , N = 
1, Score 312, P « 8.3e-24 

PIR:S48478 glucan 1, 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 300, P = 2.1e-22 



>SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). 
Length - 5,179 

HSPs: 

Score o 334 (50.1 bits), Expect = 2.4e-25, P » 2.4e-25 
Identities = 184/770 (23%), Positives = 263/770 (34%) 



PP T ++TVTP TP + +PPPT 



A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 
P +T P PGTT +PT+GQP+ TTV + 



P+ + P +++ +TT 



PT P T + T +PTT 



IQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 
Q P + TT P+ GT + T + T TP T PI 



T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 



+ P P +T + + P+ + PT P+ 

- PT PT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 3874 



Query: 


96 


Sbjct: 


3471 


Query: 


155 


Sbjct: 


3531 


Query: 


213 


Sbjct: 


3590 


Query: 


269 


Sbjct: 


3650 


Query : 


329 


Sbjct : 


3707 


Query: 


386 


Sbjct: 


3767 


Query: 


444 


Sbjct: 


3826 


Query: 


503 


Sbjct: 


3875 


Query: 


561 


Sbjct: 


3933 


Query: 


614 


Sbjct: 


3992 


Query: 


672 


Sbjct: 


4052 


Query: 


729 


Sbjct: 


4112 


Query: 


783 


Sbjct: 


4170 


Query: 


842 


Sbjct: 


4230 


Score 


- 328 


Identities = 


Query: 


96 



T TPIT++ + T PQP+ITTV T QT 



P P TQ PI T P P GTQ + TPI T P P GTQ P 



613 



-PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 
PT+V T P+P+ TTT +Q+ +T ++ P+ 



TP +T + T P PT Q P T P 



PPSQPAVALSTIPGAVPITPPITTIAAAPPPS VTVGGSLSSVLGP-PVPEI 782 

P+ TP PIT TT+ PP+ T ++++PPP 



P+ V+ P P T T P+ A + TS+ PP +S 



+ TE ++ T 
PL-TESTTLLST 4240 

(49.2 bits), Expect = 1.0e-24, P = 1.0e-24 
180/745 (24%), Positives » 254/745 (34%) 

/VVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 
VPP T++TVTP TP ++PPPT P 

Sbjct: 3540 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3599 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3600 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3658 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3659 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3718 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3719 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3775 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3776 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3835 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3836 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3894 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3895 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 3943 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 3944 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 4001 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4002 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4060 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 4061 TPITTTTTVT PTPT PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 4120 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 4121 PTGTQTPTTTPITTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4180 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAA-PPPSVTVGGSLSSVLGPPVPEIKVKEE 787 

P+ TP T PI + + PPP + + S P + 

Sbjct: 4181 TTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSSPLTESTTLLST 4240 

Query: 788 VEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMP — TSDLPPGASPR 833 

+ P M S PP +T T +P+ + LS P T+ PPG R 

Sbjct: 4241 LPPAIEM — TSTAPP-STPT-APTTTSGGHTLSPPPSTTTSPPGTPTR 4284 

Score « 325 (48.8 bits), Expect = 2.2e-24, P - 2.2e-24 
Identities = 186/782 (23%), Positives - 261/782 (33%) 

Query: 96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3494 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQTPT 3553 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 3554 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3612 

Query: 213 IIRSNAPGP-— PLH I GASH LPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQH I 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3613 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3672 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T + T+PTT T + T++ P 
Sbjct: 3673 TTTTTVT PT PTPTGTQT PTTTP I TTTTTVT PT PTPTGTQT PTTT P I TTTTTVT PTPT 3729 

Query: 329 AI SIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT- I FSTGTPVAAAT-- VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3730 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3789 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3790 TTTVTPT PTPTGTQT PTTTP I TTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTP-TPTGT 3848 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3849 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3897 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
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T TPIT++ +T PQP+ITTV T QT 
Sbjct: 3898 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3955 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3956 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4014 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 4015 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4074 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 4075 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPTTTPITT 4134 



Query: 
Sbjct: 
Query: 



729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
P+ TP PIT TT P P+ T G+ + P I V 
4135 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQT PTTTPITTTTTV 4184 



789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQHVISTEEG 848 
P PP T+T +P L+N PSP+ P + + + 

Sbjct: 4185 TPTPTPTGTQTGPPTHTST-APIAELTTSN-PPPESSTPQTSRSTSSPLTESTTLLSTLP 4242 

Query: 849 DMMETNSTDDEKSTAKSLLVKAEKRKSPP 877 

+E ST + SPP 
Sbjct: 4243 PAIEMTSTAPPSTPTAPTTTSGGHTLSPP 4271 

Score = 324 (48.6 bits), Expect » 2.8e-24, P - 2.8e-24 
Identities - 170/717 (23%), Positives - 248/717 (34%) 

Query: 95 PWVRPYPQVQMLSTHHAVASATP — VAVTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSR 152 

P P P +T + +p T PP TP+ P++ + + P P+ P 

Sbjct: 1401 PPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPL-PTTTPSPPIS 1459 

Query: 153 PIAPAPPSTLSLPPKVPGQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

PP+T PP TS + P T + P I + 

Sbjct: 14 60 TTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTPITPPASTTT 1516 

Query: 213 I I RSNAPGPPLHIGASHLPRGAAAAAVMSSSKVTTVLRPTSQ — LPNAATAQPAVQH I I H 270 

+ + PPP + P S T + PTS LP TP 

Sbjct: 1517 LPPTTTPSPPTTTTTTPPP TTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTT 1571 

Query: 271 QPIQSRP-PVTTSNAIPPAVVATVSA-TRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

P + P P TT+ PP+T T SP TTT + S PT + PP++ 

Sbjct: 1572 PPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTS 1631 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNT 388 

++ T T P P TP T I +T TP T + + T 

Sbjct: 1632 TTTLPPTTTPSPPPTTTTTP — PPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTTP 1689 

Query: 389 IPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAV-TTSNIPVAKVVPQQITHTSPRIQP 447 

P TT + S T P+S ITTPS+++ TT P P TT + P 

Sbjct: 1690 SPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPV-QFQYFLPTYPPSAY-P — LA 500 

+ + P+ P T + P VP+ + +L + P+ + P L 

Sbjct: 1750 TTTSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELI 1809 

Query: 501 AHTYTPITSSVSTIR — QYP-VSAQAPNSAITAQTGVG-VASTVHLNPMQLMTVDASHAR 556 

P ++ + R YP V + VG + P ++ + A 

Sbjct: 1810 GDVCGPGWAANISCRATMYPDVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPM-AFCLN 1868 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQ-PAPLGTQGIHSATPINTQGLQPAPMGTQQPQ— 613 

+ +Q TQ P + T + PPTI+T + PP GTQ P 

Sbjct: 1869 YEINVQCCECVTQ PTTMTTTTTENPTPPTTTPITTTTTVTPT PT PTGTQT PTTT 1922 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 1923 PITTTTTVTPT PT PTGTQT PTTTPITTTTTVTPTPT PTGTQT PTTTPITTTTTVTPTPTP 1982 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI A 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 1983 TGTQTPTTTP I TTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPT PTGTQT PTTT PI TTT 2042 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ TP PIT TT PP+T G+ + P V 
Sbjct: 2043 TT VT PT PT PTGTQT - PTTT PI T TTTTVTPTPTPT — GTQT PTTTPITTTTTVTPTPT 2096 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2097 PTGTQTPTTT- PITTTTTVTPT 2117 
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Score = 318 (47.7 bits), Expect » 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2068 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2127 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2128 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2186 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2187 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2246 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 2247 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2303 

Query: 329 AI S I QRP AQS RDVTTRITLPSH P ALGT P KQQL HTMAQKT - 1 FS TGT P V AAAT — V AP I L A 385 

q p + XT P+ GT + T + T TP T PI 

Sbjct: 2304 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2363 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2364 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2422 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P p +T + + P+ + PT P+ 

Sbjct: 2423 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 2471 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2472 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2529 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2530 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2588 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2589 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2648 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 2649 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2708 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2709 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 27 62 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2763 TPTGTQTPTTT-PITTTTTVTPT 2784 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities « 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2206 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2265 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT + P T +G Q P+ TTV + 

Sbjct: 2266 TT P ITTTTTVT PTPT PTGTQT PTTT P I TTTTTVT PT PT PTGTQT - PTTT P I TTTTTVT PT 2324 

Query: 213 IIRSNAPGP PLH I G ASHL P RG AAAAA- VM S S SK VT T V L RPT S QL PN AAT AQ P A VQH I 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2325 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQT PTTTPI 2384 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTT7VAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2385 TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT 24 41 

Query: 329 A I S I QRP AQS RDV TTRITLPSHPALGTP KQQ L HTMAQKT- 1 FS TGT P VAAAT — VA P I L A 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2442 PTGTQT PTTTPITT TTT VT PT P T PTGTQT PTTTP I TTT TT VT P T PT P T GT QT PTTT PITT 2501 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 44 3 
T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 
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Sbjct: 2502 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2560 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2561 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2609 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2610 TTT P I TTTTT VT PTPT PTGTQT PTTT PI TTTTT VTPTPTP — TGTQTPTTTPITTTTTVT 2667 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGI HSAT PINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2668 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPT PTGTQT PTT 2726 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2727 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2786 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T+ TPPTQPTP 

Sbjct: 2787 PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2846 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2847 TTTVTPTPTPTGTQT-PTTTPIT TTTTVT PT PT PT — GTQT PTTTP I TTTTTVT PT P 2900 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2901 T PTGTQT PTTT -PITTTTTVTPT 2922 

Score - 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives * 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2321 VT PTPT PTGTQT PTTTPITTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2380 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 2381 TTPITTTTTVT PTPTPTGTQT PTTTPITTTTTVT PTPTPTGTQT-PTTTPI TTTTTVT PT 2439 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2440 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPI 2499 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T + T+PTT T + T++ P 

Sbjct: 2500 TTTTT VT PTPT PTGTQT PTTT PI TTTTT VTPTPTPTGTQT PTTT PI TTTTTVT-— PTPT 2556 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2557 PTGTQTPTTTPITTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2616 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2617 TTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTP-TPTGT 2675 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2676 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2724 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2725 TTT P I TTTTTVT PT PT PTGTQT PTTT P I TTTTT VT PT PTP — TGTQTPTTTPITTTTTVT 2782 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 27B3 PTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQ-T PTTT PI TTTTTVT PTPT PTGTQT PTT 2841 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2842 TPITTTTTVT PTPT PTGTQT PTTTPITTTTTVT PTPT PTGTQTPTTTPITTTTTVT PTPT 2901 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2902 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQT PTTTPITT 2961 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2962 TTTVTPTPTPTGTQT-PTTTPIT TTTTVT PT PTPT — GTQTPTTTPITTTTTVTPTP 3015 
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Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3016 TPTGTQTPTTT-PITTTTTVTPT 3037 

Score = 318 (47.7 bits), Expect » 1.2e-23, P = 1.2e-23 
Identities - 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2390 VTPTPTPTGTQTPTTTP1TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2449 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 2450 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2508 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2509 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2568 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2569 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2625 

Query: 329 A I S I QRPAQS RD VT TRITLPSH P ALGT P KQQLH TMAQ KT - 1 FST GT PV AAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2626 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2685 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2686 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2744 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +p p +T + + P+ + PT P+ 

Sbjct: 2745 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 2793 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2794 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 2851 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- "613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

SbjCt: 2852 PTPT PTGTQTPTTT PITT TTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2910 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

SbjCt: 2911 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTTPITTTTTVT PTPT 2970 

Query: 672 RKK PAT DGAK PKSEIHVS MAT P VT VSMET VSN QNN DQ PT I AV P PTAQQPPPTIPTMI 728 

T P + TP +T + TPPTQPTP 

Sbjct: 2971 PTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTT PITT 3030 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
SbjCt: 3031 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3084 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3085 TPTGTQTPTTT-PITTTTTVTPT 3106 

Score = 318 (47.7 bits), Expect = 1.2e-23, P «- 1.2e-23 
Identities =* 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2459 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2518 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 2519 TTP ITTTTTVT PT PT PTGTQT PTTT PI TTTT TVT PT PT PTGTQT - PTTTP I TTTTTVT PT 2577 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2578 PTPTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTTPI 2637 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
SbjCt: 2638 TTTTTVTPTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQTPTTT PI TTTTTVT PTPT 2694 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT- 1 FSTGTPVAAAT- -VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2695 PTGTQT PTTT P I TTTTTVT PTPT PTGTQT PTTTP I TTTTTVTPTPTPTGTQT PTTTP ITT 2754 
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Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNTPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2755 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2813 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVFVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2814 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2862 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2863 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2920 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2921 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 297 9 

Query: 614 - P EGKT S A VV LADGAT I VAN P I S N P FS AA P AAT * T V VQTH SQS AS T N A P AQGS S P RP S I L 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2980 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3039 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3040 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3099 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3100 TTTVTPTPTPTGTQT-PTTTPIT TT TT VT PT PT PT — GTQTPTTTPITTTTTVTPTP 3153 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3154 TPTGTQTPTTT-PITTTTTVTPT 3175 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives « 243/717 (33%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2528 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2587 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 2588 TTPITTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 264 6 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2647 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2706 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2707 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2763 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2764 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2823 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 4 43 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2824 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2882 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + + P P +T + + P+ + PT P+ 

Sbjct: 2883 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 2931 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2932 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2989 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2990 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3048 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 304 9 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3108 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3109 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3168 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
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P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3169 TTTVTPTPTPTGTQT-PTTTPIT TTTTVT PTPTPT — GTQTPTTTPITTTTTVTPTP 3222 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3223 TPTGTQTPTTT-PITTTTTVTPT 3244 

Score - 318 (47.7 bits), Expect » 1.2e-23, P = 1.2e-23 
Identities « 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 V V V RP Y PQVQML S T H HA V AS AT PVA VT A P PAH L - T P A V PL S FS EGLMK P P P K PTM PS R P I 154 

VPP T+ + TVTP TP ++PPPT P 

SbjCt: 3080 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQfTPT 3139 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 3140 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3198 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3199 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3258 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3259 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3315 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

SbjCt: 3316 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3375 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3376 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3434 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVFVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3435 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3483 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 3484 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3541 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

SbjCt: 3542 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3600 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 3601 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3660 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 3661 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3720 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3721 TTTVTPTPTPTGTQT-PTTTPIT TTTTVT PTPTPT— GTQTPTTTPITTTTTVTPTP 3774 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3775 TPTGTQTPTTT-PITTTTTVTPT 3796 

Score = 313 (47.0 bits), Expect = 4.2e-23, P = 4.2e-23 
Identities - 169/695 (24%), Positives - 245/695 (35%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3655 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPT 3714 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATTSGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 3715 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3773 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P4 + P +++ +TT T TPI 

Sbjct: 3774 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3833 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3834 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3890 
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Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT—VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3891 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3950 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3951 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4009 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p P +T + + P+ + PT P+ 

Sbjct: 4010 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 4058 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 4059 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4116 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4117 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVT PTPTPTGTQTPT- 4174 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKK 674 

T+ + T+ P P T ++ ++N P + S+P+ S 

Sbjct: 4175 TTPITTT — TTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSS 4229 

Query: 675 PATDGAKPKSEIH--VSMATPVTVSMETVSNQNNDQPTIAVPP-TAQQPP--PTIPTMIA 729 

P T+ S + + M + ST + T++ PP T PP PT T 

Sbjct: 4230 PLTESTTLLSTLPPAIEMTSTAPPSTPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTG 4289 

Query: 730 AASPPSQPAVALSTI PGAVPITPP— ITTIAAAP-PPSVTVGGSLSSVLGPPVPEI 782 

++S P+ V +T P P++ PIT P P SV + L+ P E+ 

Sbjct: 4290 SSSAPTPSTVQTTTTSAWTPTPTPLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV 4349 

Score = 279 (41.9 bits), Expect - 1.8e-19, P - 1.8e-19 
Identities = 138/540 (25%), Positives = 194/540 (35%) 

Query: 278 PVTTSNAIPPAVVATVSATRAQSPVITTTAAH ATDSALSRP — TLSIQHPPSAA 329 

P+TT+ + P T + T +P+ TTT T + + P T + P 

Sbjct: 194 6 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2005 

Query: 330 ISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILAT 386 

Q P + TT P+ GT + T + T TP T PI T 

Sbjct: 2006 TGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTT 2065 

Query: 387 NTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSPR 444 

T+ P+ T G+ + T P +T T+T P++ TTT VP TT + 

Sbjct: 2066 TTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGTQ 2124 

Query: 445 IQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHT 503 

p ++ + +P P +T + + P+ + PT P+ T 

Sbjct: 2125 TPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTPT 2173 

Query: 504 YTPITSSVS-TIRQYPVSAQAPNSA-I T AQTGVG VA S T VH LN PMQLMT VDA SHARH I QG I 561 

TPIT++ +T PQP+ITTV T QT 
Sbjct: 2174 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVTP 2231 

Query: 562 QPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL Q PA PMGTQQPQ — 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2232 TPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTTT 2290 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2291 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2350 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMIA 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2351 TGTQTPTTTPITTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI TTT 2410 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2411 TTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQT PTTT P I TTTTTVT PT PT 2464 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2465 PTGTQTPTTT-PITTTTTVTPT 2485 

Score = 265 (39.8 bits), Expect » 5.8e-18, P - 5.8e-18 
Identities = 179/746 (23%), Positives = 257/746 (34%) 

Query: 96 VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3678 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPT 3737 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3738 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3796 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3797 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3856 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3857 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3913 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3914 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3973 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3974 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4032 

Query: 4 44 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P P +T + + P+ + PT P+ 

Sbjct: 4033 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG— TQTP 4081 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 4082 TTT P ITTTTTVT PT PT PTGTQT PTTT P I TTTTTVT PT PT P — TGTQT PTTT PI TTTTTVT 4139 

Query: 561 IQPAPISTQGIQPAPIGTPGI Q P A PI»GT QG I H S AT P I NTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4140 PT PT PTGTQT PTTT P I TTTTTVTPTPTPTGTQ-T PTTT PI TTT TTVTPTPTPTGTQTGPP 4198 

Query: 615 EGKT S A VVL ADG AT I VAN P I S N P FS AAP A ATTVVQTHSQSA-STNAPA — QGSSPRP 668 

TS +A+ T +NP P S+ P +T+ T S + ST PA S+ P 
Sbjct: 4199 T-HTSTAPIAELTT — SNP — PPESSTPQTSRSTSSPLTESTTLLSTLPPAIEMTSTAPP 4253 

Query: 669 SILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMI 728 

S T G S + +P + ++ PT + T T PT 

Sbjct: 4254 STPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS AWT- PTPT 4312 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

++P h P +V I + AP V G+ + E 

Sbjct: 4 313 PLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV-YNGTYGDTCYFVNCSLSCTLEF 4371 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

S P + +T +PS ++ S PT P P P +Q++ 
Sbjct: 4372 YNWSCPSTPSPTPTPSKSTPTPSKP — SSTPSKPTPGTKPPECPDFDPPRQEN 4422 

Score = 254 (38.1 bits), Expect = 8.7e-17, P = 8.7e-17 
Identities - 167/697 (23%), Positives - 245/697 (35%) 

Query: 115 SATPVAVTAPPAHLTPAVPLSFSEGLMKPPPK— PTMPSR-PIAPAPPSTLSLPPKV-PG 170 

S + T PP TP+ P + + PPP P+ p+ PI P P ST +LPP P 

Sbjct: 1587 SPPTITTTTPPPTTTPSPPTTTTT TPPPTTTPSPPTTTPITP-PTSTTTLPPTTTPS 1642 

Query: 171 QVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHL 230 

T + P + PT+ + TT I + PPP + 

Sbjct: 1643 PPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPI — TTTPSPPTTTMTTPS 1700 

Query: 231 PRGAAAAAVMSSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQS-RPPVTTSNAIPPAV 289 

P SS +TT P+S + P P + PP TT +PP 

Sbjct: 1701 P TTTPSSPITTTTTPSS TTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPPTT 1751 

Query: 290 VATVSATRAQSPVITT-TAAHATDSALSRPTLSIQH PPSAAI SIQRPAQSRDVTTR 344 

++ T PITT++++P + + + S ++P ++ 

Sbjct: 1752 TSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELIGD 1811 

Query: 345 ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATN TIPSATTAGS 397 

+ PA++++ IGV ++N IP A 

Sbjct: 1812 VCGPGWAANISCRATMYP--DVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPMAFCLNY 1869 

Query: 398 VSHTQAPTSTI— VTMTVPSHSSHATAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSS 455 

+ Q TMT + + + T TT+ I V T T + P ++ 

Sbjct: 1870 EI NVQCCECVTQPTTMTTTT-T EN PTPPTTT PI TTTTTVT PTPT PTGTQT PTTTPITTTT 1928 

Query: 456 LIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHTYTPITSSVS-T 513 

+ +P P +T + + P+ + PT P+ T TPIT++ + T 

Sbjct: 1929 TVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQT PTTT PI TTTTTVT 1977 

Query: 514 IRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQ 572 
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Q P + IT TV 



Sbjct : 


1978 


Query : 


573 


Sbjct: 


2036 


Query: 


625 


Sbjct: 


2095 


Query: 


664 


Sbjct: 


2155 


Query: 


741 


Sbjct: 


2215 


Query: 


801 


Sbjct: 


2268 


Score 


= 243 


Identities ■ 


Query: 


121 


Sbjct: 


1396 


Query: 


180 


Sbjct: 


1453 


Query: 


240 


Sbjct: 


1499 


Query: 


296 


Sbjct: 


1559 


Query: 


355 


Sbjct: 


1617 


Query: 


415 


Sbjct: 


1676 


Query: 


474 


Sbjct: 


1731 


Score 


= 189 


Identities ■ 


Query: 


439 


Sbjct: 


1398 


Query: 


498 


Sbjct: 


1457 


Query: 


557 


Sbjct: 


1517 


Query: 


617 


Sbjct: 


1567 


Query: 


675 


Sbjct: 


1621 


Query: 


732 


Sbjct: 


1679 



Q T P P TQ 

-TGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2035 



PI T 



P P GTQ + TPI T 



P + P + 



T T T +Q+ +T 



-QPAPMGTQQPQ--PEGKTSAWLA 624 
P P GTQ P P T+ V 



++ P+ 



TP 



PIT 



+T + 



TT P P+ T 
-TTTTVTPTPTPT- 



PT Q P T P 



P+ 



T TV+P+ 



+T P P TP+ P + + 



G+ + P V P P + 

-GTQTPTTTPITTTTTVTPTPTPTGTQTPTTT- 



1.3e-15 



2267 



PP+T 



PP 



P++T + P+ TT + P PP + P 
-PISTTTTPP — PTTTPSPPTTTPSPP TTTPSPPTTTTTTPPP TT 1498 



+TT + P 



SP TTT 



T+ LP 



+ P P TT+ 



PP 



S PT + 



PP+ 



TT T P 



T+ 



-PPTT 1616 



TP 



+T 



T +P 



T T P TT 



T P+ I T T P 



++ ++ +TT+ 



SP 



P P 



++ P S 



SP P M T 



S PS P LP S+ PL T TP+ S++ PS P + 

PSPTTTPSPPTTTMTTLPPTTTSS-PL TTTPL PPSITPPTFSP FSTTT PTT 1780 

(28.4 bits), Expect « 8.0e-09, P = 8.0e-09 
• 92/374 (24%), Positives » 133/374 (35%) 



T + P 



P++ T P T++ S 



+ P 



PS P+ 



P+P +T 



P P TP 



S T T 



+P T I 
-SPPTTTPI- 



+T 



PM 



LPT PS 



+T AS 



1456 



T+ 



P T 



+P 



P + p+ T+ T 



P +T L P T P P 
-TPPTSTTTLPP TTTPSPPP 1566 

3SASTNAP— AQGSSPRPSILRKK 674 

+T +P ++P P+ 

PPTTTPSPPTTTTTTPPPTTTPSP 1620 



P + 



+ P T + PT PPT P P I T 

-TTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPT 1678 



+ PS P 



+ P 



TP TT ++P + T 



S ++ 



PP 



-PEIKVK 785 
P 
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Query: 786 EEVEPMDIMRPVSAVPPLATNTVSPSL 812 

M + P + PL T + PS+ 
Sbjct: 1739 PPTTTMTTLPPTTTSSPLTTTPLPPSI 1765 

Score « 185 (27.8 bits), Expect - 1.6e-09, P » 1.6e-09 
Identities = 71/270 (26%), Positives « 99/270 (36%) 

Query: 563 PAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATP INTQGLQPAPMGTQQPQ PEG 616 

P+P +T P P TP P T + + TP I+T P P T P P 
Sbjct: 1422 PSPPTTTTTTPPPTTTPS-PPITTTTTPLPTTTPSPPISTT-TTPPPTTTPSPPTTTPSP 1479 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ T P + P +p TT + T S +T P SP + P 
Sbjct: 1480 PTTTPSPPTTTTTTPPPTTTP SPPMTTPI-TPPASTTTLPPTTTPSPPTTTTTTPPP 1535 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQ 736 

T P + TP+T T + P+ P T PPPT + PS 

Sbjct: 1536 TTTPSPPT TTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTTPSP 1588 

Query: 737 PAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRP 796 

P + +T P +PP TT PPP+ T ++ + PP + P P 

Sbjct: 1589 PTITTTTPPPTTTPSPPTTT-TTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSP — PP 1645 

Query: 797 VSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

+ P T T SP + T+ PP +P 

Sbjct: 1646 TTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTP 1681 

Score = 183 (27.5 bits), Expect « 3.4e-09, P =» 3.4e-09 
Identities « 91/390 (23%), Positives = 139/390 (35%) 

Query: 326 PSAAISIQRPAQSRDVTTR-ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPIL 384 

PS + P+ T TPSPT T I+T TP+ T +P + 

Sbjct: 1399 PSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSPPI 1458 

Query: 385 ATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIP-- VAKVVPQQITHTS 442 

+T T P TT S T P+ T + P+ ++ TT+ P + P T T 

Sbjct: 1459 STTTTPPPTTTPSPP-TTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTTL 1517 

Query: 443 PRIQPDYPAERSSLIPISGHRASP NPVAMETRSDNRP--SVPVQFQYFLPTYPPSAY 497 

P P++P SPP+T+P+P T PP+ 

Sbjct: 1518 PPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTTPPPTTT 1577 

Query: 498 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

P T TP +++T P + +P T T +T P +T S 

Sbjct: 1578 PSPPTTTTPSPPTITTTTPPPTTTPSPP TTTTTTPPPTTTPSPPTTTPITPPTSTTT 1634 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P TPPTPPPT TT P P 

Sbjct: 1635 LPPTTTPSPPPTTTTTPPPTTTPS — P-PTTTTPSPPITTTTTPPPTTTPSSPITTTPSP 1691 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ + T ++PI+ + P++TT + +T +P SP + + P 

Sbjct: 1692 PTTTMTTPSPTTTPSSPITT--TTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 174 9 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPP 715 

T + P + + P +++ T S + PT P 
Sbjct: 1750 TTTSSPLT TTPLPPSITPPTFSPFSTTTPTTPCVP 1784 

Score - 176 (26.4 bits), Expect = l.Be-07, P - 1.8e-07 
Identities - 101/402 (25%), Positives = 142/402 (35%) 

Query: 345 ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAP 404 

IT PS P TP T +T +P T P T P TT + T P 

Sbjct: 1396 ITTPSPPTT-TPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTP 1454 

Query: 405 TSTIVTMTVPSHSSHATAVTTS-NIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHR 4 63 

+ I T T P ++ + TT+ + P P T T+P P PI+ 

Sbjct: 1455 SPPISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTP— PPTTTPSPPMTTPITPP- 1511 

Query: 4 64 ASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQA 523 

AS + T PS P T PP+ P + T TPIT ST P + + 

Sbjct: 1512 ASTTTLPPTTT PSPPTTTT TTPPPTTTP-SPPTTTPITPPTSTTTLPPTTTPS 1563 

Query: 524 PNSAITAQ TGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTP 579 

P T T +T +P + T P+P +T P P TP 

Sbjct: 1564 PPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTT TPSPPTTTTTTPPPTTTP 1618 

Query: 580 G IQPAPLGTQGIHSAT PINTQGLQPAPMGTQQPQPEGKTSAVVLADGATIV 630 

IPPT + T PT PPTP S + 

Sbjct: 1619 SPPTTTPITP-PTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPP 1677 
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Query: 631 ANPISNPFSAAPAA-TTVVQTHSQSASTNAP-AQGSSPRPSILRKKPATDGAKPKSEIHV 688 

S+P + P+ TT + T S + + ++P ++P + P T P 
Sbjct: 1678 TTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP T 1734 

Query: 689 SMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPG 746 

+ +P T +M T+ P p PPT + + P+ P V L G 

Sbjct: 1735 TTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITPPTFSPF— STTTPTTPCVPLCNWTG 1790 

Score « 168 (25.2 bits), Expect = 9.3e-08, P = 9.3e-08 
Identities = 89/387 (22%), Positives = 133/387 (34%) 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPI 507 

DY + P+ +P+P T + + PP PTPSP TP 

Sbjct: 1381 DYKIRVNCCWPMDKCITTPSP PTTTPSPP — PTTTTTLPPTTTPSP-PTTTTTTPPP 1434 

Query: 508 TSSVS TIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPA 564 

T++ S T P+ P+ 1+ T +T P T + P+ 
Sbjct: 1435 TTTPSPPITTTTTPLPTTTPSPPISTTTTPPPTTT PSPPTTTPSPPTT TPS 1485 

Query: 565 PISTQGIQPAPIGTPGI-QPAPLGTQGIHSATPINTQGLQPAPMGTQQPQ PEGKTSA 620 

P +T P P TP P+ + P T P TP P T+ 

Sbjct: 1486 PPTTTTTTPPPTTTPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTT 1545 

Query: 621 VVLADGATIVANPISNPFSAAPAATTWQTHSQSA-STNAPAQGS SPRPSILRKKP 675 

+ +T P + P TT T + S +T P+ + +P P+ P 

Sbjct: 154 6 PITPPTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPP 1605 

Query: 676 ATDGAKPKSEIHVS— MATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASP 733 

TP S TP+T T + P+ P T PPPT + 

Sbjct: 1606 TTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTT 1664 

Query: 734 PSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGP PVPEIKVKEEVE 789 

PS P +T P + PITT + P ++T ++ P P 

Sbjct: 1665 PSPPITTTTTPPPTTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPP 1724 

Query: 790 PMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

P + P P T+L ++T+ LPP +P 

Sbjct: 1725 PTTMTTPSPTTTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITP 17 67 

Score = 154 (23.1 bits), Expect - 2.7e-06, P - 2.7e-06 
Identities - 70/277 (25%), Positives = 92/277 (33%) 

Query: 565 PISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAWLA 624 

PIST PPTPPPT + TP PTPPT + 

Sbjct: 1457 PISTT-TTPPPTTTPS--P-PTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTP — ITP 1510 

Query: 625 DGATIVANPISNPFSAAPAATTWQTHSQSASTNAP AQGSSPRPSILRKKPATDGA 680 

+T P + P TT T + S T P ++ P+ P T 

Sbjct: 1511 PASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTT 1570 

Query: 681 KPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQ— PPPTIPTMIAAASPPSQPA 738 

P S T T S T++ T PPT PPPT T + P P 

Sbjct: 1571 TPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTT-TPSPPTTTPITPP 1629 

Query: 739 VALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVS 798 

+ +T+P +PP TT PPP+ T ++ PP+ + 

Sbjct: 1630 TSTTTLPPTTTPSPPPTT-TTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTT 1688 

Query: 799 AVPPLATNTV SPSLALLANNL — SMPTSDLPPGASPRKKP 836 

PP T T +PS + S T PP P 

Sbjct: 1689 PSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP 1733 

Score = 148 (22.2 bits), Expect » l.le-05, P = l.le-05 
Identities = 62/254 (24%), Positives = 89/254 (35%) 

Query: 583 PAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAV VLADGATIVAKPISNP 637 

P+P T SPTLP TPPT+ +T P+ 

Sbjct: 1399 PSPPTTTP — SPPPTTTTTLPP TTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTT 14 52 

Query: 638 FSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPKSEIHVS — MATPVT 695 

+ P +TT T+++P SPP+ PT P SM TP+T 

Sbjct: 1453 TPSPPISTTT— TPPPTTTPSPPTTTPSP-PTTTPSPPTTTTTTPPPTTTPSPPMTTPIT 1509 

Query: 696 VSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPIT 755 

T + P+ T PP T P+ + P P + +T+P +PP T 

Sbjct: 1510 PPASTTTLPPTTTPSPPTTTTTTPPPTTTPS— PPTTTPITPPTSTTTLPPTTTPSPPPT 1567 

Query: 756 TIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALL 815 

T PPP+ T ++ PP + PP T P+ + 

Sbjct: 1568 T-TTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPI 1626 
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Query: 


816 


Sbjct: 


1627 


Score 


= 131 


Identities 1 


Query: 


96 


Sbjct : 


3977 


Query: 


155 


Sbjct: 


4037 


Query : 


213 


Sbjct : 


4096 


Query : 


269 


Sbjct : 


4156 


W UCA J ■ 


325 


Sbjct : 


4216 


Query: 


382 


Sbjct: 


4270 


Query: 


440 


Sbjct: 


4322 


Query: 


494 


Sbjct: 


4379 


Query: 


553 


Sbjct: 


4438 


Score 


- 117 


Identities - 


Query: 


710 


Sbjct: 


1398 


Query: 


770 


Sbjct: 


1457 


Query: 


830 


Sbjct: 


1505 


Score 


* 61 


Identities « 


Query: 


397 


Sbjct: 


1257 


Query: 


453 


Sbjct: 


1317 


Score 


- 50 


Identities ! 


Query: 


334 


Sbjct: 


1261 


Score 


= 46 


Identities ; 


Query: 


324 



T+ LPP +P 



(19.7 bits), Expect = 1.2e-03, P = 1.2e-03 
112/492 (22%), Positives = 174/492 (35%) 



+T 



+ + T V T 



P G T T 



TP 



+ P P PT 



P T +G Q P+ 



TT V 



--PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 
P+ + P +++ +TT T T P I 



IHQPIQSRPPVTTSNAIPPA— VVATVSATRAQSPVITTTA— AHATDSALSRPTLSIQH 324 
+ PT P + T+ T +PTT H + + ++TS 



P S+ 



+ TT + TLP PA+ 



+ T T + T T++ 
-EMTSTAPPSTPTAPTTTSGGHTLS 4269 



P +TTP TTG++ + APT + V T S 
PPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS- 



A T + P++ 
-AWTPTPTPLS- 



P I 
-TPSIIR 4321 



HTSPRIQPDYPAERSSLI PISGHRASPNP-VAMETRSDN RPSVPVQFQYFLPTYP- 4 93 

T ++P YP+ ++ +P V T D S+ +++ + P 

FTG— LRP-YPSSVLICCVLNDTYYAPGEEVYNGTYGDTCYFVNCSLSCTLEFYNWSCPS 4378 

-PSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDA 552 

PSP+ + TPSS+ P P TL + T 

rPSPTPTPSKS-TPTPSKPSSTPSKPTPGTKPPECPDFDPPRQENETWWLCDCFMATCKY 4 437 



P P + G+QP + 



(17.6 bits), Expect » 1.8e-02, P = 1.8e-02 
41/156 (26%), Positives «- 55/156 (35%) 



T + P T 



PPPT T + 



+S+ 



+P 



PP 



+ PS P 



P + 



T +T 



+T P 



+PPITT 



T T SP 



P P+ T 



T+ PP 



+T 



9.2 bits), Expect - 1.6e-09, P « 1.6e-09 
23/93 (24%), Positives = 41/93 (44%) 



S++ + +T T+T+P+ + T TT+ P + V 



-PQQITHTSPRIQPDYPAE 452 
P+ SI D+P+ 



+P + E RS P + ++ 



7.5 bits), Expect - 8.0e-09, P = 8.0e-09 
16/41 (39%), Positives = 19/41 (46%) 



RP+ 



TT ITLP+ P 



T+ ST TP 



3, P - 5.4e-08 
37/106 (34%) 



+PP A++ 



+S 
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Sbjct: 1196 YPPGPlSVPTEETCKSCVCTNSSQVVCRPEEGKILNQTQDGAFCYWEICGPNGTVEKHFNI 1255 

Query: 384 LATNT IPSA- TT AG S VS HTQA PTSTIVTMTVPSHS SH AT A VT TS N I 428 

+ T PS TT +++ PTS T T + +S TT + 

Sbjct: 1256 CSITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKL 1301 

Score - 44 (6.6 bits), Expect = 8.7e-08, P = 8.7e-08 
Identities « 14/34 (41%), Positives = 17/34 (50%) 

Query: 478 RPSVPVQFQYF-LPTYPPSAYPLAAHTYTPITSSV 511 

RPS F LPT P S + T TP +S+V 

Sbjct: 1261 RPSTLTTFTTITLPTTPTS-FTTTTTTTTPTSSTV 1294 



Pedant information for DKFZphtes3_2all, frame 2 



Report for DKFZphtes3_2all .2 



[ LENGTH ] 1048 

[MW] 110324.04 

[pi] 9.83 

[ HOMOL ) PIR: 147141 gastric mucin (clone PGM-2A) - pig (fragment) 8e-15 

[FUNCAT] 30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-09 

[ FUNC AT ] 30.01 organization of cell wall [S. cerevisiae, YlR019c] le-09 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c] le-09 

[FUNCAT] 30.02 organization of plasma membrane [S- cerevisiae, YDR420w] 4e-09 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR420w] 

4e-09 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJRlSlc] 4e-06 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGR014w] 
le-05 

[FUNCAT] 11.01 stress response [S. cerevisiae, YHL028w] le-04 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YHL028w] le-04 

[EC] 3.2.1.3 Glucan 1, 4-alpha-glucosidase 3e-08 

[PIRKW] glycosidase 3e-08 

[PIRKW] transmembrane protein 3e-08 

[PIRKW] polysaccharide degradation 3e-08 

[PIRKW] glycoprotein 9e-08 

(PIRKW] calcium binding 9e-08 

[PIRKW] hydrolase 3e-08 

[PIRKW] cytoskeleton 7e-08 

[SUPFAM] equine herpesvirus glycoprotein X 2e-07 

[SUPFAM] yeast glucan 1, 4-alpha-glucosidase homolog 3e-08 

[SUPFAM] polymorphic epithelial mucin 7e-08 

[SUPFAM] glucan 1 , 4-alpha-glucosidase homology 3e-08 

(SUPFAM] equine herpesvirus 1 glycoprotein homology 2e-07 

(PROSITE] MYRISTYL 9 

[PROSITE] AM I DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2 PHOSPHO_SITE 10 

[PROSITE] PKC~PHOSPHG_SITE 12 

[PROSITE] ASN GLYCOSYLATION 3 

[KW] Irregular 

[KW] LOW_COMPLEXITY 20.04 % 

SEQ MGPPRHPQAGEIEAGGAGGGRRLQVEMSSQQFPRLGAPSTGLSQAPSQIANSGSAGLINP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeeccccccccccccccccccccccccccccccccc 

SEQ AATVNDESGRDSEVSAREHMSSSSSLQSREEKQEPVWRPYPQVQMLSTHHAVASATPVA 
SEG xxxxx xxxxxxxxxxxx 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ VTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESSI 

SEG xxxxxxxxxxxxx xxxxxxxxxx . . xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccceeeccccc 

SEQ PQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHLPRGAAAAAVM 

SEG xxxxx. . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQSRPPVTTSNAIPPAVVATVSATRAQS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQL 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ HTMAQKTIFSTGT P V AAAT V A P I L AT NT I P S ATT AG S VS HTQA PT S T I VTMT V P 5 H S S HA 

SEG xxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPS 

SEG xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeecccccccc 

SEQ VPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQ 

SEG 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ GLQPAPMGTQQPQPEGKTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AQGSSPRPSILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccceeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ EIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQ 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HVISTEEGDMMETNSTDDEKSTAKSLLVKAEKRKSPPKEYIDEEGVRYVPVRPRPPITLL 

SEG xxxxxxxxxxx. . . . 

PRD ccccccccccccccccccccchhhhhhhhhccccccccccccccccccccccccccccee 

SEQ RHYRNPWKAAYHHFQRYSDVRVKEEKKAMLQEIANQKGVSCRAQGWKVHLCAAQLLQLTN 

SEG 

PRD eeccccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeecccceeehhhhhhhhhhc 

SEQ LEHDVYERLTNLQEGIIPKKKAATDDDLHRINELIQGNMQRCKLVMDQISEARDSMLKVL 

SEG 

PRD cchhhhhhhhhhhceeeeccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DHKDRVLKLLNKNGTVKKVSKLKRKEKV 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhccccceeeeeeeeccccc 



Prosite for DKFZphtes3_2all.2 



PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS0O005 
PSOO005 
PSOO005 
PS0O005 
PS0O005 
PS0O005 
PS0O005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PSOO006 
PS0O0O6 
PS00006 
PSOO006 
PS0O006 
PS0O006 
PS00008 



818->822 

854- >858 
1033->1037 

872->876 
1037->1041 
68->71 
75->78 
242->245 
342->345 
355->358 
442->445 
513->516 
665->668 
831->834 
862->865 
940->943 
1035->1038 
63->67 
6B->72 
75->79 
88->92 
135->139 
473->477 
844->848 

855- >859 
959->963 
984->988 

15->21 



AS N_GL YC 0 S YL AT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC~PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PH0"S ITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC~PHOSPHO~SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHQSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
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PS00008 


16->22 


MYRISTYL 


PDOC00008 


PS00008 


36->42 


MYRISTYL 


PDOC00008 


PS00008 


233->239 


MYRISTYL 


PDOC00008 


PS00008 


372->378 


MYRISTYL 


PDOC00008 


PS00008 


533->539 


MYRISTYL 


PDOC00008 


PS00008 


535->541 


MYRISTYL 


PDOC00008 


PS00008 


590->596 


MYRISTYL 


PDOC00008 


PS00008 


768->774 


MYRISTYL 


PDOC00008 


PS00009 


19->23 


AMI DAT I ON 


PDOC00009 



(No Pfam data available for DKFZphtes3_2all.2) 
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DKFZphtes3_2al7 



group: metabolism 

DKFZphtes3_2al7 encodes a novel 574 amino acid protein without similarity to known proteins. 

The novel protein contains a thiol protease cys pattern. Eukaryotic thiol proteases (EC 
3.4.22.-) are a family of proteolytic enzymes containing an active site cysteine. Cathepsins 
belong to this protease family. 

The new protein can find application in modulation of proteolytic processes and as a new 
enzyme for proteomic analysis and biotechnologic production processes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2312 bp 

Poly A stretch at pos. 2300, polyadenylation signal at pos. 2273 



1 GTTTTCACCT GATCATTAGA AACTAATGAA ACACCTTTTA AGTCTTATGA 
51 ATTCAGGTTA CACTGTTTTC CAGATGCCTT GGCAGCTGGT ACAGGGCCTC 
101 TGAAAAATGG AACCAAATTC TCTGAGGACT AAAGTCCCAG CTTTCTTATC 
151 TGATTTGGGG AAGGCCACAT TGAGGGGAAT CAGAAAGTGT CCCCGATGTG 
201 GCACATACAA TGGAACCCGG GGACTGAGCT GTAAGAACAA GACATGTGGA 
251 ACCATATTCC GCTACGGTGC ACGCAAGCAG CCTAGTGTTG AAGCTGTCAA 
301 AATCATTACA GGCTCTGATC TTCAGGTCTA CTCAGTGCGG CAAAGAGACC 
351 GGGGCCCTGA TTACCGATGC TTTGTGGAGC TCGGGGTTTC AGAGACAACA 
401 ATCCAGACAG TGGATGGGAC GATCATCACT CAGCTGAGCT CTGGACGGTG 
451 TTATGTCCCC TCATGCCTGA AAGCTGCCAC TCAAGGCGTT GTGGAAAACC 
501 AGTGCCAGCA CATCAAGCTG GCGGTGAACT GCCAGGCAGA GGCCACCCCT 
551 CTGACCCTGA AGAGCTCGGT CCTGAATGCA ATGCAGGCCT CCCCGGAAAC 
601 CAAACAGACC ATCTGGCAGT TGGCCACGGA ACCCACAGGT CCTCTGGTGC 
651 AGAGAATTAC TAAAAACATC TTGGTGGTGA AATGCAAGGC AAGCCAGAAG 
701 CACAGTTTGG GGTATTTGCA TACATCTTTT GTGCAGAAAG TCAGTGGCAA 
751 AAGCTTGCCT GAGCGCCGCT TCTTCTGCTC CTGTCAGACT CTGAAATCGC 
801 ACAAGTCAAA TGCCTCCAAG GAT GAG AC AG CCCAGAGATG CATTCATTTC 
851 TTTGCTTGCA TCTGTGCCTT TGCCAGTGAT GAGACACTGG CTCAGGAATT 
901 CTCAGACTTC CTAAATTTTG ATTCCAGCGG TCTTAAAGAG ATTATTGTAC 
951 CCCAGTTAGG TTGCCATTCA GAATCAACAG TATCTGCTTG TGAGTCTACT 
1001 GCCTCTAAGT CAAAGAAGAG GAGAAAGGAT GAAGTATCTG GTGCACAGAT 
1051 GAACAGTTCA CTACTGCCTC AAGATGCAGT GAGCAGTAAT CTAAGGAAAA 
1101 GTGGCCTGAA AAAGCCTGTG GTTGCTTCCT CGTTAAAAAG GCAGGCCTGT 
1151 GGTCAGCTGT TAGATGAGGC ACAAGTGACT TTATCCTTCC AAGACTGGCT 
1201 GGCCAGTGTC ACAGAACGCA TCCATCAAAC CAT GC ACT AT CAGTTTGATG 
1251 GCAAACCAGA ACCATTGGTG TTCCACATTC CTCAGTCATT TTTTGATGCC 
1301 CTGCAACAAA GAATATCTAT AGGAAGTGCA AAAAAACGGC TCCCCAACTC 
1351 CACCACAGCT TTTGTTCGGA AAGATGCCTT GCCACTGGGA ACCTTTTCCA 
14 01 AGTATACTTG GCATATCACT AATATCCTGC AAGTTAAACA AATCTTAGAT 
1451 ACCCCAGAGA TGCCCTTGGA AATCACCCGT AGCTTTATCC AGAACCGAGA 
1501 TGGGACTTAT GAGCTATTTA AATGCCCTAA AGTGGAAGTA GAAAGCATAG 
1551 CAGAAACCTA CGGTCGTATA GAAAAACAAC CAGTGCTGCG ACCCTTGGAA 
1601 CTAAAAACTT TTCTCAAAGT TGGCAACACT TCCCCAGATC AAAAGGAGCC 
1651 AACACCTTTC ATCATCGAGT GGATCCCAGA TATCCTTCCC CAATCTAAGA 
1701 TTGGCGAGCT GCGGATCAAG TTTGAGTATG GCCACCACCG GAATGGGCAT 
1751 GTGGCGGAGT ACCAAGACCA GCGGCCCCCC TTGGACCAGC CCTTGGAACT 
1801 GGCCCCTCTG ACCACTATTA CTTTCCCTTA AAGCAAAACA AGATAATAAT 
1851 CTTTTGCTGC TTAATTTGCA CATCCCCACC CCTTGACAAC TTTAAATGCT 
1901 AGTTAGGCAC TTAGATGGCC CTGTTCCTTG GTAAACTGCT CTTAGCTAAG 
1951 ATGCAAATTC TCAGTGCTTT CAAGTGGATT CTGTTGAAGA AAATCTCTTG 
2001 TAAATAGCCT TTTTGATGCT GCTGTGTACA GTCTTCATTA TGCATTGGGC 
2051 AGTATTTCTG GCTAGAGTTT TAAAAGGAAC AGAAAGAAAA CCAGCTTATT 
2101 TTCCTTCTTA CGGACTCATC TTTAGCGTTT ATTTCAACCT TTTGCTAATT 
2151 CTCTGAGAAA TCTGCAGCAC TCAGCCATAC ACCAACAGTG TTGGAAAGTT 
2201 AACACCCTGG TTAGGGCAGA ATGTTAAAGA CCATCTTGGC AGAGTTCCAG 
2251 CCACGCTCTT TATTCTGTTC TCAAATAAAG CAGTGTCACT AGTTTTTCCT 
2301 AAAAAAAAAA AA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 1828 bp; peptide length: 574 
Category: putative protein 



1 MEPNSLRTKV 

51 FRYGARKQPS 

101 TVDGTIITQL 

151 LKSSVLNAMQ 

201 LGYLHTSFVQ 

251 CICAFASDET 

301 KSKKRRKDEV 

351 LLDEAQVTLS 

401 QRISIGSAKK 

451 EMPLEITRSF 

501 TFLKVGNTSP 

551 EYQDQRPPLD 



PAFL5DLGKA 
VEAVKIITGS 
SSGRCYVPSC 
ASPETKQTIW 
KVSGKSLPER 
LAQEFSDFLN 
SGAQMNSSLL 
FQDWLASVTE 
RLPNSTTAFV 
IQNRDGTYEL 
DQKEPTPFII 
QPLELAPLTT 



TLRGIRKCPR 
DLQVYSVRQR 
LKAATQGVVE 
QLATEPTGPL 
RFFCSCQTLK 
FOSSGLKEII 
PQDAVSSNLR 
RIHQTMHYQF 
RKDALPLGTF 
FKCPKVEVES 
EWIPDILPQS 
ITFP 



CGTYNGTRGL 
DRGPDYRCFV 
NQCQHIKLAV 
VQRITKNILV 
SHKSNASKDE 
VPQLGCHSES 
KSGLKKPWA 
DGKPEPLVFH 
SKYTWHITNI 
IAETYGRIEK 
KIGELRIKFE 



SCKNKTCGTI 
ELGVSETTIQ 
NCQAEATPLT 
VKCKASQKHS 
TAQRCIHFFA 
TVSACESTAS 
SSLKRQACGQ 
IPQSFFDALQ 
LQVKQILDTP 
QPVLRPLELK 
YGHHRNGHVA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2al7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3 2al7, frame 2 



Report for DKFZphtes3_2al7 .2 



[LENGTH] 

[MW] 

tpl] 

CPROSITE] 
[PROSITE) 
[PROSITE] 
[PROSITE J 
[PROSITE] 
[KW] 



574 

64076.89 
9.15 

MYRISTYL 5 
CK2_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
AS N_GLYCOS YLAT I ON 
THIOL_PROTEASE_CYS 

Alpha_Beta ~ 



9 

14 

5 

1 



SEQ MEPNSLRTKVPAFLSDLGKATLRGIRKCPRCGTYNGTRGLSCKNKTCGTI FRYGARKQPS 

PRD ccccccccccchhhhhcccchhhhhcccccccccccccccccccccccceeeeccccccc 

SEQ VEAVKIITGSDLQVYSVRQRDRGPDYRCFVELGVSETTIQTVDGTIITQLSSGRCYVPSC 

PRD ceeeeeeecccceeeeeccccccccceeeeeecccccceeeccceeeeeecccccccchh 

SEQ LKAATQGVVENQCQHIKLAVNCQAEATPLTLKSSVLNAMQASPETKQTIWQLATEPTGPL 

PRD hhhhhhhhcchhhhheeehhhhhhhcccccchhhhhhhhhcccchhhhhhhhhcccccch 

SEQ VQRITKNILVVKCKASQKHSLGYLHTSFVQKVSGKSLPERRFFCSCQTLKSHKSNASKDE 

PRD hhhhhhheeeeeecccccccccccceeeeeeecccccccceeeecccccccccccccccc 

SEQ TAQRCIHFFACICAFASDETLAQEFSDFLNFDSSGLKEIIVPQLGCHSESTVSACESTAS 

PRD hhhhhhhhhhhhhhhhhchhhhhhhhhhhccccccceeeeeecccccccceeeccccccc 

SEQ KSKKRRKDEVSGAQMNSSLLPQDAVSSNLRKSGLKKPWASSLKRQACGQLLDEAQVTLS 

PRD ccchhhhhccccccccccccccccchhhhhhhccccceeehhhhhhhhhchhhhhhhhhh 

SEQ FQDWLASVTERIHQTMHYQFDGKPEPLVFHIPQSFFDALQQRISIGSAKKRLPNSTTAFV 

PRD hhhhhhhhhhhhhhhhhhhcccccccceeehhhhhhhhhhhhhhhhcccccccccceeee 

SEQ RKDALPLGTFSKYTWHITNILQVKQILDTPEMPLEITRSFIQNRDGTYELFKCPKVEVES 

PRD ecccccccccceeeeehhhhhhhhhhhccccccccceeeeeeccccceeeecccceeeeh 

SEQ IAETYGRIEKQPVLRPLELKTFLKVGNTSPDQKEPTPFIIEWIPDILPQSKIGELRIKFE 

PRD hhhhhhhhhccccccccccceeeeecccccccccccceeeeecccccccccccceeeeee 
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SEQ YGHHRNGHVAEYQDQRPPLDQPLELAPLTTITFP 
PRD ecccccceeeeccccccccccccccccceeeccc 



Prosite for DKFZphtes3_2al7.2 



PS00001 


35->39 


PS00001 


44->48 


PS00001 


235->239 


PS0OO01 


316->320 


PS00O01 


414->418 


PS00005 


5->8 


PS00005 

* w w w w V *J 


21->24 


PS00005 

r *j \j w w w *j 


41->44 


PS00005 

r *j v \j v v 


76->79 


r o V/ v w v 


112->115 


PSQ0005 


150->153 




196->199 




213->216 




228->231 




231 ->234 


PS00005 


302->305 


PS00005 


342->345 


PS00005 


369->372 


PS00005 


407->410 


PS00006 


68->72 


PS00006 


216->220 


PS00006 


237->241 


PS00006 


293->297 


PS00006 


360->364 


PS00006 


367->371 


PS00006 


394->398 


PS00006 


480->484 


PS00006 


508->512 


PS00008 


32->38 


PS00008 


93->99 


PS00008 


104->110 


PS00008 


127->133 


PS00008 


312->318 


PS00139 


109->121 



ASN_GLYCOS YLATI ON 

ASN_GLYCOS YLATI ON 

ASN_G LYCOS YLAT I ON 

ASN_GLYCOS YLATI ON 

ASN GLYCOSYLATION 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH03SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH02SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

THIOL PROTEASE CYS 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00126 



(No Pfam data available for DKFZphtes3_2al7.2) 
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DKFZphtes3_2dl5 



group: testes derived 

DKFZphtes3_2dl5 encodes a novel 274 amino acid protein with similarity to 
C.elegans cosmid F25H2.1. 

The novel protein contains a Pfam predicted C2-domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to C.elegans F25H2.1 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3615 bp 

Poly A stretch at pos. 3603, polyadenylation signal at pos. 3578 

1 GCGGCGGCCT CGAGGTGACA ACTGTCTCCG TCGCAGGCTC CGGCGGGGGC 

51 GCAGGAGGTC GCCCGGCGCG TCACTGTCGG GTCGGCGAGC CACGGGGGCC 

101 GCCGCAGCAC CATGGCGACC ACCGTCAGCA CTCAGCGCGG GCCGGTGTAC 

151 ATCGGTGAGC TCCCGCAGGA CTTCCTCCGC ATCACGCCCA CACAGCAGCA 

201 GCGGCAGGTC CAGCTGGACG CCCAGGCGGC CCAGCAGCTG CAGTACGGAG 

251 GCGCAGTGGG CACCGTGGGC CGACTGAACA TCACGGTGGT ACAGGCAAAG 

301 TTGGCCAAGA ATTACGGCAT GACCCGCATG GACCCCTACT GCCGACTGCG 

351 CCTGGGCTAC GCGGTGTACG AGACGCCCAC GGCACACAAT GGCGCCAAGA 

401 ATCCCCGCTG GAATAAGGTC ATCCACTGCA CGGTGCCCCC AGGCGTGGAC 

451 TCTTTCTATC TCGAGATCTT CGATGAGAGA GCCTTCTCCA TGGACGACCG 

501 CATTGCCTGG ACCCACATCA CCATCCCGGA GTCCCTGAGG CAGGGCAAGG 

551 TGGAGGACAA GTGGTACAGC CTGAGCGGGA GGCAGGGGGA CGACAAGGAG 

601 GGCATGATCA ACCTCGTCAT GTCCTACGCG CTGCTTCCAG CTGCCATGGT 

651 GATGCCACCC CAGCCCGTGG TCCTGATGCC AACAGTGTAC CAGCAGGGCG 

701 TTGGCTATGT GCCCATCACA GGGATGCCCG CTGTCTGTAG CCCCGGCATG 

751 GTGCCCGTGG CCCTGCCCCC GGCCGCCGTG AACGCCCAGC CCCGCTGTAG 

801 CGAGGAGGAC CTGAAAGCCA TCCAGGACAT GTTCCCCAAC ATGGACCAGG 

851 AGGTGATCCG CTCCGTGCTG GAAGCCCAGC GAGGGAACAA GGATGCCGCC 

901 ATCAACTCCC TGCTGCAGAT GGGGGAGGAG CCATAGAGCC TCTGCCTCGA 

951 TGCCGTTTTG CCCCCGCTCT TTGGACACGC CGACCCGGCG CTCCCCAAGG 

1001 AATGCTGTCC CAACAAGATT CCCGTGAAAG AGCACCCGTG TCGCCCCCTC 

1051 CCGTGGACTT CTGTGCCGCC CCGTCCACAC CTGTTCTTGG GTGCATGTGG 

1101 GTTTTCGGTT CCTGGCGGTC CAGGACGGGG CGGGGGCTCC CCTCCCATCT 

1151 CGTGCTGGGA GGTCTCAGCG CGCTCTCCTG TCCCTGGGAC GTGCGTCTCT 

1201 CCTTCTCATG CCGTTCTGGA AAATGCTCTT GCTGTAGAGA GCAGCTGCTT 

1251 CTGCCAGGGT GTTGGAGGTG GTGGAGCGCC TTCCGATTCC ATTCATGGCA 

1301 TTTTGTGATG TGATGTAATT GGAATAGAGC TGTTGATTTA AGGCACACAC 

1351 AATCCCTCAC ACTGTGGGTT TTTTTTAGAA CTTCCCAGAC GAAAACTCAC 

1401 GCCCTTGCCC TAACGCGCTT TGCTGTGAGC CTGGCCCCTG CCCAGGGCTT 

1451 GGGTCTGGTG AGCTGAGCAG CTTCCTGTGG ATGGTGTGGG GCCGGCCTCT 

1501 GGCCTGGCTC ACCTGGCCAC TGTCCAGCCA GCCTTGTGAC AGACTCCGGC 

1551 CTGAAGGCAG AATGAACCCA CACCTGGAGT GAGGAAGGGG GCCTGGCACG 

1601 GTTGGCCAGG CTCTGCCTGA TTGCCAGCCA GCGGGCATCT GAAGCCGGGT 

1651 CCTTCGCCCG CCGGAGGCTG CCGTCCGTCT CTCCTGCTGC GCTCGTGCCA 

1701 GCTCCGTGGG TGTCCTCCCA GGGAGCTTCT CTTCTCAACA GGCCTTGCGA 

1751 GGCTGGGGTG AGAGGTGATA GAGGCAGCAC TGTGCATGAT TCCGAGAGGG 

1801 TGTGGTGGCA CTGCCAGCCG ACTGCTGACA GCTTGGGAGC TGCTGTGCCC 

1851 AGGACGTGGG TTCAGCGTGG GCGAGGAAAG CCTGGCGAGC GTGGCCCTGT 

1901 AAAAGCTTTC TGAGGCGGGA GGCGCTCACT TACCTCTGAC TGCCTGGGCG 

1951 CTGCGTGTAG CATCTTGGCC TACAGGACAG ATTTTAGGTG ACACCTGGTT 

2001 ATGACAGTCA GAAATTTGAG AAGCTTCTCA CAAGTGATGC ACTTTAAATA 

2051 ATCTGCATGC CATTGAGACA CCTGCATGTC TGGTGTTTGT GGTTCAAGTG 

2101 TCTTGCCGCC GGCCTTCGGA TGTAAACCCA CTGATAACGG ACAGAAAGAG 

2151 AATGCCCACA AGTGGGTCTT CTGTGGAAGA TGCAGAAGGA GGAAGTTAGT 

2201 GCTTACATTT TAGTCTTTTT CTCCCTCAAA AAAATAGGTT AAGTTTCAGT 

2251 GCCAGCTAGA AAATACTGCT TTCTGCCATC GATTGGGGGT GGTTTTTGTC 

2301 AAATATACTG TTGATAAATA TTTATTTTTG TAAACTTGAA GTGTGTGGTG 

2351 GCCGTGGGGG AGGGACATGC TGGCAGCAGG CGCCTTCTTC AGCTGTGGGT 

2401 CCTAAAGGCC TTTGATCCTT TGAAGAAGAA AGACATGGTA TTTGTTCAGC 

2451 AGACGCCGAC CACTCAGACG GAGGGGCCCC TGGGATTCCC TGTCTCAGAT 

2501 GGCCTGGTCT TACGCCTGTG TAGATTTCTT CTCCATTGGG AATGAAGGTG 

2551 TCAGGCGGGA CTGGAACGTT CTAGATGGTA TGTTCCGTGA TATTAACAAC 

2601 TCTAACCCAG GACAGACCAC AAGCCACACT CAGAGGCCTC ACTGTGCTGG 
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2651 GGGCTTCGGT GTCCAGGCGC 
2701 CTTCGCGTTG CTGGGGTGCA 
2751 TCTGTGGGTG TCTCCTAGAG 
2801 CAGCCCGTGT GGGGGCCCGA 
2851 GGAGGGAGAG CAACCCTTTG 
2901 TTTTCTTTTT CACAAGCGCT 
2951 CAAGGCCTTT AATTAAATAA 
3001 TTCCTGTTTG AAGGCTTACT 
3051 CTGAGCCCCT CCGAGCGGTC 
3101 CTCCCCCGCC CCCGCCTGTG 
3151 AGGACAGGCT TGTCTGCCAG 
3201 AGCTGGGTTT AGGCCCCTGG 
3251 GCTGCTCCTG CTCCTGGGTT 
3301 GCAGCGGTCA CTAAGGACAG 
3351 GGGCTCCGGA GATAGAAGAC 
3401 TCCCCTCTGC AGATGCTCCC 
3451 AGTGGTCTCA GAACGTGCGC 
3501 AGATTTTTCT TTGATTGTAA 
3551 TAATAAATGA TCCATATAAA 
3601 TTTAAAAAAA AAAAA 



CCAGGTGTGG CCACCAGCAC CGGTTTCTGC 
GTGAGACTGC CACACGCGTG CACATGTGGC 
AGGACGTGGC CCCTGCTGCC AGCCCTTGAG 
GGGACCCACA CAGTGGGGGC CAGCCTCGCT 
CCGATGACCA CGCTTGCCGC CATCTCTTAG 
TTATTTTTTT AATAGACAAA TCACATTTTG 
GATTCTTCTT TCCTTCATTT TATGCTTTAT 
GTAGAAGTGG CTTACTGTAG AAGCAGCTTG 
CCCAGAATTA GCTGGTTCAC AACCCCCACC 
TCAGGTGTGG ATGAGGTCGT CACACTCAGA 
CTCACAAGGG GAGGCTGCAG TGGGTTTGGG 
TGTCTGAGGG CCCAGGCCTT GCCAGCCTCT 
TGAAGATGCA GGCCGATCGC CAGCTCCGTG 
CCTGACTGTG CCATCTTGGA GCCTCAGGCG 
AGGTCGCCGG AGGCTCCCCC TCCTCTCCTC 
TGGGCGCTAC CCTGCAGGGT GCCAGGCAGG 
TTCTGATTAT TTTACTGGGG TCCATTGTCC 
AATATATTTT TACTTTTTAG TCTTCTAATT 
AATAGAGAAA TAAAGTCCTT TAAGGGAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 112 bp to 933 bp; peptide length: 274 
Category: similarity to unknown protein 
Classification: no clue 



1 MATTVSTQRG PVYIGELPQD FLRITPTQQQ RQVQLDAQAA QQLQYGGAVG 
51 TVGRLNITVV QAKLAKNYGM TRMDPYCRLR LGYAVYETPT AHNGAKNPRW 
101 NKVIHCTVPP GVDSFYLEIF DERAFSMDDR IAWTHITIPE SLRQGKVEDK 
151 WYSLSGRQGD DKEGMINLVM SYALLPAAMV MPPQPVVLMP TVYQQGVGYV 
201 PITGMPAVCS PGMVPVALPP AAVNAQPRCS EEDLKAIQDM FPNMDQEVIR 
251 SVLEAQRGNK DAAINSLLQM GEEP 

BLAST P hits 

No B LAS TP hits available 

Alert BLASTP hits for DKFZphtes3_2dl5, frame 1 

TREMBL : CEF2 5H2 1 gene: T25H2.1"; Caenorhabditis elegans cosmid F25H2, 
N - 1, Score ="385, P = l.le-35 



>TREMBL:CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 
Length = 457 

HSPs: 

Score » 385 (57.8 bits), Expect = l.le-35, P = l.le-35 
Identities = 77/182 (42%), Positives = 118/182 (64%) 

Query: 4 TVSTQRGPVYIGELPQDFLRIT-PTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVVQA 62 

TV+ +R V +GELP FLR+ PQQ+++Q+++ T GRL++T+++A 

Sbjct: 5 TVAERRRQVLVGELPPHFLRLAVPIQQTAEPEI-VQP-RMVSFVPP-NTRGRLSVTILEA 61 

Query: 63 KLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIFDE 122 

L KNYG+ RMDPYCR+R+G ++T AN + P WN+ ++ +P V+S Y++IFDE 
Sbjct: 62 NLVKNYGLVRMDPYCRVRVGNVEFDTNVAANAGRAPTWNRTLNAYLPMNVESIYIQIFDE 121 

Query: 123 RAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYAL — LPAAMV 180 

+AF D+ IAW HI +P ++ G D+++ LSG+QG+ KEGMI+L S+A LP 
Sbjct: 122 KAFGPDEVIAWAHIMLPLAIFNGDNIDEYFQLSGQQGEGKEGMIHLHFSFAPIDLPLQQA 181 
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Query: 181 MPPQP 185 
P +P 

Sbjct: 182 APAEP 186 

Score - 92 (13.8 bits), Expect = 1.8e-01, P = 1.7e-01 
Identities = 26/68 (38%), Positives = 38/68 (55%) 

Query: 194 QQGVG Y V P I T GMP AVC SPGMVPV — ALP — PAA VN AQ P RC S E E DL KAI QDM F PNM DQEVT 249 

QQG G + + +P +P+ A P PA +EED K IQ+MFP +D+EVI 

Sbjct: 156 QQGEGKEGMIHLHFSFAPIDLPLQQAAPAEPAPAPLPVEITEEDTKEIQEMFPIVDKEVI 215 

Query: 250 RSVLEAQR 257 

+ +LE +R 
Sbjct: 216 KCILEERR 223 

Pedant information for DKFZphtes3_2dl5, frame 1 



Report for DKFZphtes3_2dl5. 1 

[LENGTH] 274 

[MW] 30281.97 

tpl] 5.68 

[HOMOL] TREMBL : CEF2 5H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosraid F25H2 4e-36 

[PFAMJ C2 domain 

[KW) Alpha_Beta 

[KW] LOW_COMPLEXITY 16.42 % 

SEQ MATTVSTQRGPVYIGELPQDFLRITPTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVV 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeccccceeeecccchhhhhhhhhhhhhhhhhcccccceeeeceeeh 

SEQ QAKLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIF 

SEG 

PRD hhhhhhhhcccccccchhhhheeeeeecccccccccccccceeeeeccccccceeeeeec 

SEQ DERAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYALLPAAMV 

SEG xxxxxxxx 

PRD cccccccccceeeeccccccccccccccceeeeeccccccccccceeeeehhhhhhhhhc 

SEQ MPPQPWLMPTVYQQGVGYVPITGMPAVCSPGMVPVALPPAAVNAQPRCSEEDLKAIQDM 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD ccccceeeeeeeeecccccccccccceeecccccccccccceeeeccccchhhhhhhhhc 

SEQ FPNMDQEVIRSVLEAQRGNKDAAINSLLQMGEEP 

SEG 

PRD ccccchhhhhhhhhhhccccchhhhhhhhhhccc 

(No Prosite data available for DKFZphtes3_2dl5 . 1) 

Pfam for DKFZphtes3_2dl5 . 1 

HMM_NAME C2 domain 

HMM *LtVrIIeARNLWkMDMnGf SDPYVKVdMdPdpkDtkKWKTkTiWNNGLN 
L++++++A+ + + M+ DPY+++ + + + +T T +N N 

Query 55 LN IT VVQAKLAKN YGMT- RMDPYCRLRLGYAV Y ETPTAHNGAKN 97 

HMM PVWNEEeFvFedlPyPdlqrkMLRFaVWDWDRFSRBDFIGHCi* 

P+WN + +P + + ++++D+ FS +D 1+ + 
Query 98 PRWN-KVIHCT-VPPGVDSF YLEIFDERAFSMDDRIAWTH 135 
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DKFZphtes3_2el2 



group: Transcription Factors 

DKFZphtes3_2el2 encodes a novel 84 9 amino acid protein with similarity to Zinc finger 
proteins . 

The new protein is a putative transcription factor with three C2H2 zinc fingers. Additionally, 
a cytochrome C family heme-binding site signature is present in the protein, which is only 
found in cytochrom C related proteins. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 

similarity to finger proteins 

complete cDNA, complete cds, 5 EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3205 bp 

Poly A stretch at pos. 3192, polyadenylation signal at pos . 3171 

1 GGCACGGCCG GGTCCTGGCT GGCCAAACGA GGCTCGCGGA AGCAGCAGCC 
51 GCCGCCTGAC CGCAGCTGGA TTTTGAAGAT TGATCCAAGG GACTGTATTA 

101 ATTTCAGGAA TTGATTTGAA AGACACTGGC TCTGCCACTT AACAGCCATG 

151 TAACCTTGGA TATGGAAGAA AGTAGCAGTG TTGCCATGTT GGTGCCAGAT 

201 ATTGGGGAAC AGGAAGCTAT ACTGACTGCT GAAAGTATCA TCAGTCCTTC 

251 ATTGGAAATT GATGAACAAA GAAAAACTAA ACCAGATCCA TTAATCCATG 

301 TTATCCAGAA GTTAAGCAAG ATAGAAAAAT GAAAAGTCAC AAAAATGTCT 

351 TTTAATTGGG AAGAAACGCC CACGTTCAAG TGCTGCAACA CACTCTCTTG 

401 AAACCCAAGA ACTTTGTGAG ATTCCGGCTA AAGTAATCCA GTCACCTGCT 

451 GCTGATACTA GAAGGGCTGA GATGTCACAA ACAAATTTTA CCCCTGACAC 

501 TCTTGCCCAG AATGAAGGGA AGGCTATGTC TTATCAGTGT AGCCTTTGTA 

551 AGTTTCTATC ATCATCCTTT TCCGTGTTAA AAGATCATAT TAAGCAACAT 

601 GGTCAGCAAA ATGAAGTGAT ACTGATGTGC TCAGAGTGCC ATATTACATC 

651 TAGAAGCCAG GAGGAACTTG AAGCCCACGT GGTGAATGAC CATGACAATG 

701 ATGCCAATAT CCACACCCAA TCCAAAGCCC AACAGTGCGT AAGCCCCTCC 

751 AGCTCTTTGT GTCGGAAAAC CACAGAAAGA AATGAAACCA TTCCAGATAT 

801 CCCAGTAAGT GTGGACAATC T AC AG ACT C A TACTGTCCAA ACTGCATCTG 

851 TGGCAGAAAT GGGTAGGAGG AAATGGTATG CATACGAACA GTACGGCATG 

901 TATCGATGCT TGTTTTGTAG TTATACTTGT GGCCAGCAGA GAATGTTGAA 

951 AACACACGCT TGGAAACATG CTGGGGAGGT TGATTGCTCC TATCCAATCT 
1001 TTGAAAATGA AAATGAACCC CTAGGCCTGC TGGATTCTTC AGCAGCTGCT 
1051 GCGCCTGGTG GGGTCGATGC AGTCGTCATT GCTATTGGAG AGAGTGAACT 
1101 GAGTATCCAC AATGGGCCAT CAGTGCAAGT GCAGATTTGC AGCTCAGAAC 
1151 AGTTATCATC TTCATCTCCT TTAGAACAGA GTGCAGAAAG AGGAGTACAC 
1201 CTAAGTCAGT CAGTTACCCT GGACCCCAAT GAGGAAGAAA TGCTAGAAGT 
1251 GATTTCTGAT GCAGAGGAGA ATCTGATTCC TGATAGCCTG CTTACATCAG 
1301 CACAGAAAAT CATCAGCAGC AGCCCCAATA AAAAAGGGCA TGTTAACGTG 
135X ATAGTGGAGC GATTGCCAAG TGCTGAAGAA ACCCTTTCAC AGAAGCGCTT 
1401 CCTCATGAAC ACTGAAATGG AAGAAGGGAA GGACCTGAGC CTGACAGAAG 
1451 CTCAGATTGG GCGCGAAGGA ATGGATGATG TTTATCGTGC TGATAAATGT 
1501 ACTGTTGATA TTGGGGGATT GATCATAGGC TGGAGCAGTT CAGAGAAAAA 
1551 AGACGAGTTA ATGAATAAAG GCCTGGCTAC TGATGAGAAT GCCCCACCAG 
1601 GCCGGAGAAG GACAAATTCT GAGTCTCTTC GATTACACTC ATTAGCTGCA 
1651 GAAGCCCTTG TCACAATGCC TATAAGAGCT GCAGAGTTGA CAAGAGCCAA 
1701 CCTGGGGCAC TATGGAGATA TAAACCTTTT AGATCCAGAT ACTAGTCAAA 
1751 GGCAAGTAGA TAGTACATTG GCAGCGTACT CAAAAATGAT GTCGCCACTT 
1801 AAAAACTCTT CAGATGGATT AACTAGTCTT AACCAAAGCA ACTCCACCTT 
1851 GGTAGCACTC CCAGAGGGTA GGCAGGAATT GTCAGATGGG CAGGTTAAGA 
1901 CAGGCATCAG CATGTCCTTA CTCACCGTCA TTGAAAAATT GAGAGAAAGG 
1951 ACAGACCAAA ACGCTTCAGA CGATGACATT TTGAAAGAGT TGCAGGACAA 
2001 CGCCCAGTGC CAACCCAACA GCGATACAAG TTTGTCCGGA AACAATGTGG 
2051 TGGAATACAT CCCGAATGCT GAACGACCCT ACCGTTGCCG CCTGTGTCAC 
2101 TACACAAGTG GCAACAAGGG CTACATCAAG CAGCACTTAC GAGTCCATCG 
2151 AC AG AG AC AG CCTTATCAGT GTCCTATCTG CGAGCACATA GCGGACAACA 
2201 GCAAAGATTT GGAGAGTCAC ATGATCCACC ACTGTAAGAC AAGAATATAC 
2251 CAGTGCAAGC AGTGTGAAGA ATCCTTCCAT TATAAGAGTC AATTGAGGAA 
2301 CCATGAGAGA GAACAGCACA GTCTTCCAGA TACCTTGTCA ATAGCAACTT 
2351 CTAATGAGCC AAGAATTTCC AGTGATACAG CTGATGGAAA ATGTGTCCAG 
2401 GAAGGGAATA AGTCTTCAGT CCAGAAACAA TATAGATGTG ATGTGTGTGA 
2451 TTATACAAGT ACAACATATG TTGGTGTCAG AAACCACAGG CGAATCCATA 
2501 ACTCTGATAA GCCGTACAGA TGCTCTCTGT GTGGGTATGT GTGTAGCCAT 
2551 CCTCCTTCTT TGAAGTCTCA TATGTGGAAA CATGCAAGTG ACCAAAATTA 
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2601 CAACTACGAA CAAGTAAACA AGGCTATTAA CGACGCGATT TCACAAAGTG 

2651 GCAGAGTTCT GGGGAAATCC CCTGGAAAGA CTCAATTAAA GAGCAGTGAA 

2701 GAGAGTGCAG ATCCCGTCAC TGGAAGTTCG GAAAATGCAG TGTCATCTTC 

2751 AGAACTGATG TCCCAGACTC CCAGTGAAGT TCTGGGTACC AACGAGAATG 

2801 AGAAACTGAG CCCTACAAGT AATACCTCAT ATAGTTTAGA AAAAATCTCC 

2851 AGTCTGGCCC CTCCTAGCAT GGAGTACTGC GTTTTACTCT TCTGCTGTTG 

2901 TATTTGTGGT TTTGAATCAA CCAGCAAAGA AAACCTCTTG GATCATATGA 

2951 AAGAGCACGA GGGTGAAATT GTAAACATCA TCCTGAATAA GGACCACAAT 

3001 ACAGCTCTAA ACACAAATTA GGTGGAATAA TGACTCGAGC AGGAAAGCAG 

3051 TAGAAGAGGA TTCCTTCACC ACAGTTTCAC CTTTACGCTG TCAGACAACT 

3101 TCCTGCCACA GAAGAAGTCG TTGATGTGAT TTTTGAGGAA ATGACAGATG 

3151 TGACTTTGGA ACCAAACTTG TAATAAAAGG AATTCCAAAT GGAAAAAAAA 

3201 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



90301500: 

Cloning and sequencing of a zinc finger cDNA expressed in mouse testis. 
92310982: 

Zfp-37, a new murine zinc finger encoding gene, is expressed in a 
developmentally regulated 

pattern in the male germ line. 



Peptide information for frame 1 



ORF from 472 bp to 3018 bp; peptide length: 849 
Category: similarity to known protein 



1 MSQTNFTPDT LAQNEGKAMS YQCSLCKFLS SSFSVLKDHI KQHGQQNEVI 

51 LMCSECHITS RSQEELEAHV VNDHDNDANI HTQSKAQQCV SPSSSLCRKT 

101 TERNETIPDI PVSVDNLQTH TVQTASVAEM GRRKWYAYEQ YGMYRCLFCS 

151 YTCGQQRMLK THAWKHAGEV DCSYPIFENE NEPLGLLDSS AAAAPGGVDA 

201 VVIAIGESEL SIHNGPSVQV QICSSEQLSS SSPLEQSAER GVHLSQSVTL 

251 DPNEEEMLEV ISDAEENLIP DSLLTSAQKI ISSSPNKKGH VNVIVERLPS 

301 AEETLSQKRF LMNTEMEEGK DLSLTEAQIG REGMDDVYRA DKCTVDIGGL 

351 IIGWSSSEKK DELMNKGLAT DENAPPGRRR TNSESLRLHS LAAEALVTMP 

401 IRAAELTRAN LGHYGDINLL DPDTSQRQVD STLAAYSKMM SPLKNSSDGL 

451 TSLNQSNSTL VALPEGRQEL SDGQVKTGIS MSLLTVIEKL RERTDQNASD 

501 DDILKELQDN AQCQPNSDTS LSGNNWEYI PNAERPYRCR LCHYTSGNKG 

551 YIKQHLRVHR QRQPYQCPIC EHIADNSKDL ESHMIHHCKT RIYQCKQCEE 

601 SFHYKSQLRN HEREQHSLPD TLSIATSNEP RISSDTADGK CVQEGNKSSV 

651 QKQYRCDVCD YTSTTYVGVR NHRRIHNSDK PYRCSLCGYV CSHPPSLKSH 

701 MWKHASDQNY NYEQVNKAIN DAISQSGRVL GKS PGKTQLK SSEESADPVT 

751 GSSENAVSSS ELMSQTPSEV LGTNENEKLS PTSNTSYSLE KISSLAPPSM 

801 EYCVLLFCCC ICGFESTSKE NLLDHMKEHE GEIVNIILNK DHNTALNTN 



BLASTP hits 



Entry S10245 from database PIR: 
finger protein, testis - mouse 

Score = 265, P = 8.4e-23, identities « 61/205, positives = 91/205 

Entry S22954 from database PIR: 
finger protein zfp-37 - mouse 

Score =* 265, P - 9.1e-22, identities - 61/205, positives = 91/205 
Entry AF031657_1 from database TREMBL: 

gene: "Zfp94"; product: "zinc-finger protein 94"; Rattus norvegicus 
z^.nc-finger protein 94 (Zfp94) gene, partial cds. 

Score - 243, P - 1.6e-21, identities * 57/190, positives = 85/190 



Alert BLASTP hits for DKFZphtes3_2el2, frame 1 
No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_2el2, frame 1 



Report for DKFZphtes3_2el2 . 1 



[LENGTH) 849 

[MW] 94325,42 

[pi] 5.47 

[HOMOL] PIR:A54661 zinc finger protein ZNF41 - human (fragment) 2e-22 

[FUNCAT) 04.05.01.04 transcriptional control (S. cerevisiae, YJL056c] 3e-09 

(FUNCAT) 30.10 nuclear organization [S. cerevisiae, YJL056c] 3e-09 

[FUNCAT] 04.03.01 trna synthesis IS. cerevisiae, YPR186c PZFl - TFIIIA] le-07 

{ FUNCAT ] 04.01.01 rrna synthesis [S. cerevisiae, YPRl86c PZFl - TFIIIA] le-07 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YOR113w] 4e-07 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YGL209w] 

2e-04 

[FUNCAT] 13.04 homeostasis of other ions [S. cerevisiae, YNL027w] 2e-04 

[FUNCAT] 11.01 stress response [S. cerevisiae, YMR037c] 3e-04 

[BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

[SCOP] dlmeyg_ 9.6.1.1.1 a designed zinc finger protein [syntheti 8e-06 

[PIRKW] nucleus 8e-18 

[PIRKW] RNA binding 5e-13 

[PIRKW] duplication 7e-13 

[PIRKW] tandem repeat le-21 

[PIRKW] spermatogenesis 6e-16 

[PIRKW] zinc 9e-21 

[PIRKW] zinc finger le-21 

[PIRKW] DNA binding le-21 

[PIRKW] metal binding 3e-15 

[PIRKW] phosphoprotein 5e-13 

[PIRKW] leucine zipper le-13 

[PIRKW] alternative splicing 6e-18 

[PIRKW] eye lens 2e-16 

[PIRKW] oocyte le-12 

[PIRKW] transcription factor 6e-18 

[PIRKW] segmentation 7e-13 

[PIRKW] embryo le-12 

[PIRKW] transcription regulation 2e-19 

[PIRKW] homeobox 2e-08 

[SUPFAM] POZ domain homology 7e-15 

[SUPFAM] transcription factor Krueppel 7e-13 

[SUPFAM] zinc finger protein ZFP-36 le-21 

[SUPFAM] homeobox homology 2e-08 

[SUPFAM] unassigned homeobox proteins 2e-08 

[PROSITE] CYTOCHROME^ 1 

[PROSITE] MYRISTYL 10 

[PROSITE] ZINC_FINGER_C2H2 3 

[PROSITE] AMIDATION 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 18 

[PROSITE] TYR_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITE] AS N_GL Y C OS Y L AT I ON 7 

[PFAM] Zinc finger, C2H2 type 

[KW] Irregular 

[KW] 3D 

[KW] LOW COMPLEXITY 5.65 % 



SEQ MSQTNFTPDTLAQNEGKAMSYQCSLCKFLSSSFSVLKDHIKQHGQQNEVILMCSECHITS 

SEG xxxxxxxxxxxxxxx 

lmeyF 

SEQ RSQEELEAHVVNDHDNDANIHTQSKAQQCVSPSSSLCRKTTERNETIPDIPVSVDNLQTH 

SEG 

lmeyF 

SEQ TVQTASVAEMGRRKWYAYEQYGMYRCLFCS YTCGQQRMLKTHAWKHAGEVDCSYPI FENE 

SEG 

lmeyF 

SEQ NEPLGLLDSSAAAAPGGVDAVVIAIGESELSIHNGPSVQVQICSSEQLSSSSPLEQSAER 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . . . 

lmeyF 

SEQ GVHLSQSVTLDPNEEEMLEVISDAEENLIPDSLLTSAQKIISSSPNKKGHVNVIVERLPS 

SEG 

lmeyF 
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SEQ AEETLSQKRFLMNTEMEEGKDLSLTEAQIGREGMDDVYRADKCTVDIGGLI IGWSSSEKK 

SEG 

lmeyF 

SEQ DELMNKGLATDENAPPGRRRTNSESLRLHSLAAEALVTMPIRAAELTRANLGHYGDINLL 

SEG 

lmeyF 

SEQ DP DTSQRQV OS T LAA Y S KMM S PLKN S S DGLTSLNQSN ST LV AL P EG RQEL S DGQ VKT G I S 

SEG 

lmeyF 

SEQ MSLLTVIEKLRERTDQNASDDDI LKELQDNAQCQPNSDTSLSGNNVVEYIPNAERPYRCR 

SEG 

lmeyF TTTEETT 

SEQ LCHYTSGNKGYIKQHLRVHRQRQPYQCPICEHIADNSKDLESHMIHHCKTRIYQCKQCEE 

SEG 

lmeyF TTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCCCEEETTTTE 

SEQ SFHYKSQLRNHEREQHSLPDTLSIATSNEPRISSDTADGKCVQEGNKSSVQKQYRCDVCD 

SEG 

lmeyF EECCHHHHHHHHHHHC 

SEQ YTSTTYVGVRNHRRIHNSDKPYRCSLCGYVCSHPPSLKSHMWKHASDQNYNYEQVNKAIN 

SEG 

lmeyF 

SEQ DAISQSGRVLGKSPGKTQLKSSEESADPVTGSSENAVSSSELMSQTPSEVLGTNENEKLS 

SEG 

lmeyF 

SEQ PTSNTSYSLEKISSLAPPSMEYCVLLFCCCICGFESTSKENLLDHMKEHEGEIVNIILNK 

SEG 

lmeyF 

SEQ DHNTALNTN 

SEG 

lmeyF 



Prosite for DKFZphtes3_2el2 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 

psooooe 



106->110 
126->130 
232->236 
262->266 
300->304 
314->318 
323->327 
355->359 
381->385 
485->489 
499->503 
617->621 
626->630 
741->745 
758->762 
766->770 
817->821 



104->108 
445->449 
454->458 
457->461 
497->501 
646->650 
784->788 
98->102 
378->382 



101->104 
306->309 
357->360 
385->388 
425->428 
678->681 
696->699 
726->729 
817->820 



59->62 



62->66 



ASN_GLYCOSYLATION 

ASN_GL Y COS Y L AT I ON 

ASN_GL YCOS Y LAT I ON 

ASN_GL YCOS Y LAT I ON 

ASN_GL YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

ASN_GL Y COS YLAT I ON 

CAMP PHOSPHO SITE 

CAMP~PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SlTE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S I T E 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PH0SPH02SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2 PHOSPHO~SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PEK)CO0006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00007 


331- 


->339 


TYR PHOSPHO 


SITE 


PDOC00007 


PS00007 


703- 


->711 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


596->605 


TYR PHOSPHO~SITE 


PDOC00007 


PS00008 


142- 


->148 


MYRISTYL 


PDOC00008 


PS00008 


185- 


->191 


MYRISTYL 




PDOC00008 


PS00008 


196- 


->202 


MYRISTYL 




PDOC00008 


PS00008 


241- 


->247 


MYRISTYL 




PDOC00008 


PS00008 


349- 


->355 


MYRISTYL 




PDOC00008 


PS00008 


473- 


->479 


MYRISTYL 




PDOC00008 


PS00008 


478- 


->484 


MYRISTYL 




PDOC00008 


PS00008 


645- 


->651 


MYRISTYL 




PDOC00008 


PS00008 


751' 


->757 


MYRISTYL 




PDOC00008 


PS00008 


772- 


->778 


MYRISTYL 




PDOC0O008 


PS00009 


130- 


->134 


AM I DAT I ON 




PDOC0O009 


PS00009 


376- 


->380 


AMI DAT ION 




PDOC0O009 


P500028 


146- 


->167 


ZINC FINGER 


C2H2 


PDOC0O028 


PS00028 


684- 


->705 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


595- 


->617 


ZINC FINGER" 


'C2H2 


PDOC00028 


PS00190 


53->59 


CYTOCHROME C 


PDOC00169 



Pfam for DKFZphtes3_2el2 . 1 



similarity to finger proteins 



similarity to finger proteins 



similarity to finger proteins 



similarity to finger proteins 



similarity to finger proteins 



similarity to finger proteins 



790 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR.T.H* 

C++ C+ T R++++L++H H 
Query 53 CSE — CHITSRSQEELEAHVVN-DH 



74 



23.25 (bits) f: 539 t: 559 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus : 
Query *CpwPDCgKtFrrwsNLrRHMRTH* 
C C++T ++ ++H+R+H 

dkfzphtes3 539 CRL — CHYTSGNKGYIKQHLRVH 559 

Query f: 567 t: 587 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

CP+ C+ ++ +L+ HM+ H 
Query 567 CPI — CEHIADNSKDLESHMIHH 587 

33.47 (bits) f: 595 t: 616 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus: 
Que r y * CpwPDCgKt Fr rwsNLr RHMR . T . H * 

C+ C+++F ++S+LR+H R H 

dkfzphtes3 595 CKQ — CEESFHYKSQLRNHERE-QH 616 

Query f: 656 t: 676 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus: 
HMM * CpwPDCgKt FrrwsNLrRHMRTH* 

C++ C++T ++ R+H+R+H 
Query 656 CDV — CDYTSTTYVGVRNHRRIH' 676 

24.53 (bits) f: 684 t: 704 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus: 
Query * CpwPDCgKt FrrwsNLrRHMRTH* 

C+ CG++ +++ +L+ HM H 

dkfzphtes3 6B4 CSL— CGYVCSHPPSLKSHMWKH 704 

Query f: 809 t: 829 Target: dkf zphtes3_2el2 . 1 

Alignment to HMM consensus: 
HMM * CpwPDCgKt FrrwsNLrRHMRTH* 

C + CG ++++NL HM+ H 
Query 809 CCI — CGFESTS KENLLDHMKEH 829 
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DKFZphtes3_2f 14 



group: testes derived 

DKFZphtes3_2f 14 encodes a novel 129 amino acid protein with very weak similarity to human 
omega protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

weak similarity to omega protein 

complete cDNA, complete cds, 1 EST hit 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2353 bp 

Poly A stretch at pos. 2341, no polyadenylation signal found 

1 GCAGATTCTC CAGGCCCAGC ATCTGCCTCA CCGTGGCCCC CCACAAGCCA 
51 AGCGCCTGCC TTTCAGCAGC CTCTACACAC CCAGCTCCTG CCACCCAATG 

101 GCTCTTTAGG CCAAGCTCAT ACCTCACGAT GATTTTTCCA GGCCCAACTT 

151 TTGTCTCATG GCAACCTTCC CTGGCCAAGT TTCCACCTAT TTCCTGGCAG 

201 CCTGGACAGG CCCAGGTCCT GCCACACACT GGCCTCTCTA CGCCCAGCTC 

251 ATGCCTCACA GTGGCCTCTC CAGGCCCAGC TCCTGTCCCG GGACATCATC 

301 TCCAGGCCCA AAACTTCCTC AAGTCGGCCT CTCCAGGCCC AGTTGCTGCC 

351 TCCCGGCATT CTCTCCAGGC CTAGCTCTTC CTCCTGGCTG TATCTACAAG 

401 ACCAACTCCT GCCTCACAAC AACCTTTTAT GGCTCAGCTC CTGCCCAACT 

451 ACTGCCGGCC TTTGTAGGCC CAAAACTTCC TCAAGTCAAG CTCTTTAGGC 

501 CCACCTTCTG CCTTGCAGTG GCCTGTACAG ACCCAGCTCT GGCTTGAGAA 

551 CAGCCTCTGC AGGCCCTGCT CTTGCCTCTT AGCTCCCTCT CCAGGCCCAT 

601 CTCTTGCCTC ACAGTGGCTT CCGTGGGCCA AGTTCCCGCC TGCCTCCCAG 

651 CAGCCTCAAC AGGCCTAGCT CCTCCCTCAC AATGGCTTGT TTAGGTCCAG 

701 TTGATGCCTC TGGCAACCTG TCCAGGCCCA GCTCCTGCCT CACACTGGCC 

751 TCTCTAGGCC GAGGTCCTTT CTCATACTGG CCTGTTTAGG CCCAGCTCAT 

801 TCCTCTTGTC ATCTCTCCAG GCCCAGCTTT TGCCTGTTGT TGGCCTCTAC 

851 CTCACAGTGC ACCTTCCAGT CCCACCTCTT GCCTCACCAT GGCCTCCTCT 

901 GACCAGGTTC CTGCCTTTCG GCAGCCTCTA CAGGCCTAGC TGCTGCCTCC 

951 CAATGGCCTT TGTAGGCCAC GCTCATGCCT CACTGTGGCC TTTCCAGGCC 
1001 TAGCTTTCGC TTTTTGGCCA CTCCAGGCCC AGAACTTCCC CCAGTCAGCC 
1051 TCTCCAGGCC CAGCTCTTCC TCCCAGCAAC CTCTGCAGGC CCAAATCATC 
1101 CTCAAATTGG CCTCTTCTTT CCCAGCTCCT GCCTCCTGGT GGCCTCTGAA 
1151 GACCCAAATC GTCCTCCAGT TGGTTTTTCC AGGCCCAGCT CCTGCCTTTT 
1201 GGTGGCCTCT CCAGGTGCAA AACTTCCTCC CATCAGCCTG TCCAGGCCCA 
1251 GCTCATGCCT CTTGGTGGCC TTCTCAGGCC CTGCTTTTGA CTTGGTGGCC 
1301 TCTTCAGGCC CAGAACTTGA ACTCAAGTCA GCCTCTCCAG GCCCAGCTCC 
1351 TGCCTTCTTA AGGTCTGTAC AGGCCCAGCC TCTACCTCAC AGCGGACTCT 
1401 CCACACCCAG CTCTTGCCTC ACTGTAGCCT CCCCAGTCCA AAACTCCTGC 
1451 CTTTTGGCAG CTTCGACAAG CCCAGCTCCT GCCTTTCAAT GACCTCTTTA 
1501 GGCCCCGCTC ATTCCTTACA ACGGCCTTTC CAGGCCCAGT TTTTCCCTTT 
1551 TGGCGGCCTC TCCAGGCCCA GAACTTCCTC AAGTCGGCCT CTTTAGGCCC 
1601 AGTTGCTGCC TCCTGGCATC CTCTGCAGGC CGAGCTCTTC CTCCCTGCTG 
1651 TGTCTACAGG CCCAACTCCT GCCTCACAAC AACCTCCTTG GACTCAGCTT 
1701 CTGCCCAGCT CCTGGTGGCC TTTGTAGGCT CAAAATTTTC TCAAATCAAG 
1751 CTCTCCAGGC CTACTGTCAG CCTCGTGGCA GCCTAAACAG GCCCAGCTCC 
1801 TGCCTGACAA TGGCCTCTCC AGGCTTTTCT CCTGCCTCGC AGCAGGCTTT 
1851 CCAGGCCCAG CTCTTGCCTC ATGGTGGCCT TCCCCGGCCA TGTTCCTATC 
1901 TGACTTCTGG CAGCCTCAAC CGGCCCAGCT TCTGCCTCAC ACTGGCCTCT 
1951 CTAGGCCCAG CTCCTTTTTC ACAGTGGCCT CACTACGCCC ATCTCCTACC 
2001 TCAGATCTGC CTCCCAAGAC CCAGCTCCTG TCTCATGGTG GTCTCTCTTA 
2051 CACCAGCTCC TGCCTCACAA TGGCCTCGTC TGGCCCATCT TCTGCCTCAC 
2101 AGTGGCCACT CAAGGCCCAT CTTTTGCCTC ATGGTAGCCT CTTCTGGTTT 
2151 TGCTCTTGCC TCACAGTTGC CTCTTCCAGA TCCAGCTTTA AGCCTTTGAT 
2201 GGTCAACAGC ATCAAGGAGC CTAAAGCTTC CCTGGACTCT CATTTGTTCA 
2251 CTTTACAGCA GAGTGCCTTA GCAAAAACTG TCTCTTAACC TTGAGAGTGG 
2301 ATTTCTGACA AATCGATAGT AAATTCTGCC TGTGTGGTTT CAAAAAAAAA 
2351 AAA 



BLAST Results 



No BLAST result 
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No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 158 bp to 54 4 bp; peptide length: 129 
Category: similarity to known protein 



1 MATFPGQVST YFLAAWTGPG PATHWPLYAQ LMPHSGLSRP SSCPGTSSPG 
51 PKLPQVGLSR PSCCLPAFSP GLALPPGCIY KTNSCLTTTF YGSAPAQLLP 
101 AFVGPKLPQV KLFRPTFCLA VACTDPALA 

BLAST P hits 

Entry 170697 from database PIR: 
omega protein - human (fragment) 

Score =79, p » 2.8e-03, identities = 32/94, positives = 38/94 



Alert BLASTP hits for DKFZphtes3_2f 14 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2f 14, frame 2 

Report for DKFZphtes3_2f 14 .2 



[LENGTH] 129 

(MW) 13421.76 

tplj 9.14 

[PROSITE] MYRISTYL 2 

[KW] Irregular 

(KWJ LOW_COMPLEXITY 10.85 % 



SEQ MATFPGQVSTYFLAAWTGPGPATHWPLYAQLMPHSGLSRPSSCPGTSSPGPKLPQVGLSR 

SEG xxxxxxxxxxxxxx 

PRD cccccccceeehhhhhcccccccccccccccccccccccccccccccccccccccccccc 

SEQ PSCCLPAFSPGLALPPGCIYKTNSCLTTTFYGSAPAQLLPAFVGPKLPQVKLFRPTFCLA 

SEG 

-PRD cccccccccccccccccccccccccceeeccccccccccccccccccccccccccccccc 

SEQ VACTDPALA 

SEG 

PRD ccccccccc 



Prosite for DKFZphtes3_2f 14 . 2 

PS00008 6->12 MYRISTYL PDOC00008 

PS00008 92->98 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2f 14 . 2} 



792 



WO 01/12659 



PCTYIB00/01496 



DKFZphtes3_2g7 



group: testes derived 

DKFZphtes3_2g7 encodes a novel 359 amino acid protein with similarity to neurof iliament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to neurofilament proteins 

complete cDNA, complete cds, 6 EST hits (5 hits are out of a testis 
library) 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1613 bp 

Poly A stretch at pos. 1595, polyadenylation signal at pos. 1557 

1 GCCACACAGG CTCCTTGGAG TAAGAGTGTG AGAAACTGGA TGAAGACAGC 
51 TGTATTCTTT TGGAAGCGTT CGAGATTGGT CTGTCTCTAC CAACTAAAAA 

101 CTTCTAGCTT AAGTGCAGAG ATTTAAGGAG ATCAACAAAA ACTCAGTCTA 

151 GACATATTAT GAGGCTGGGA GGGTATCAAC AGACTTGAGT TCTTGTCAGC 

201 AAGATCACCT GCTTTTAATA TTGTCCTCAG GGTCTGAGCA CATCTGGAAG 

251 TGAGGTCAAT CAAGTTAGAC CCCAAAAACT TTTGTGACAA CAGTGAAGAG 

301 GGGAAAATAA ACACACCACA AACATGAACC TCAACCCCCC GACATCTGCT 

351 CTTCAGATCG AGGGCAAAGG CAGCCATATT ATGGCTAGAA ATGTAAGCTG 

401 CTTTCTAGTC AGGCACACCC CTCATCCCAG AAGAGTCTGC CACATCAAAG 

451 GCTTGAATAA CATTCCAATC TGTACTGTGA ATGATGATGA GAATGCATTT 

501 GGAACATTGT GGGAAGTTGG CCAGTCTAAC TACTTAGAGA AGAACAGGAT 

551 ACCATTTGCC AATTGCAGTT ACCCCCCGAG CACTGCAGTC CAGAAGAGCC 

601 CTGTAAGAGG AATGTCGCCA GCCCCAAACG GTGCCAAAGT GCCTCCACGG 

651 CCTCATTCTG AGCCCAGTAG AAAAATTAAA GAGTGCTTCA AAACTTCCAG 

701 TGAGAATCCC TTAGTAATTA AAAAGGAAGA AATTAAGGCC AAAAGACCAC 

751 CATCACCTCC AAAGGCATGC TCTACTCCTG GCTCCTGTTC TTCAGGGATG 

801 ACAAGTACCA AGAATGATGT GAAAGCAAAC ACCATTTGCA TACCAAACTA 

851 TCTGGATCAG GAAATAAAAA TCCTGGCAAA GCTCTGTAGC ATTTTGCATA 

901 CTGATTCTCT GGCAGAAGTT TTACAGTGGC TGCTTCATGC AACTTCAAAA 

951 GAAAAAGAGT GGGTCTCAGC TTTGATTCAT TCTGAGCTTG CCGAGATAAA 
1001 CCTGTTAACT CATCACAGAA GAAACACCTC AATGGAACCA GCAGCAGAGA 
1051 CTGGGAAGCC ACCCACAGTT AAATCACCAC CCACAGTTAA ATTGCCCCCA 
1101 AATTTTACTG CAAAATCAAA AGTGCTGACC AGAGATACAG AAGGGGATCA 
1151 ACCAACCAGA GTGTCAAGTC AAGGATCTGA AGAAAACAAG GAAGTACCAA 
1201 AAGAGGCTGA GCACAAGCCT CCACTACTTA TAAGAAGAAA TAATATGAAA 
1251 ATACCTGTTG CAGAATATTT CAGCAAACCA AATTCTCCTC CCAGGCCTAA 
1301 CACTCAGGAG AGTGGATCAG CAAAACCAGT GTCAGCAAGG AGTATACAAG 
1351 AATACAACCT CTGTCCCCAA AGAGCATGTT ATCCTTCAAC ACACCGGAGG 
1401 TAGAAGTTCT AGACTGGGTG AATTCTTTCA TGAATATGAG CTTCACATTT 
1451 ACATCATCAA ATTATTTTTC AAATGAATAT TTTTGGTATT GAGGAATCAA 
1501 GTGGTCCTCT TTATGGTGGC ACATGTAAAT CTAAAAATAC CTGTATGTAA 
1551 TGCTACAAAT AAATATTACT GGAAATGATA TTTCCATTTG TAGTTAAAAA 
1601 AAAAAAAAAA AAA 

BLAST Results 



No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 324 bp to 1400 bp; peptide length: 359 
Category: similarity to known protein 
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1 MNLNPPTSAL 

51 TVNDDENAFG 

101 PNGAKVPPRP 

151 TPGSCSSGMT 

201 QWLLHATSKE 

251 SPPTVKLPPN 

301 LLIRRNNMKI 

351 ACYPSTHRR 



QIEGKGSHIM 
TLWEVGQSNY 
HSEPSRKIKE 
STKNDVKANT 
KEWVSALIHS 
FTAKSKVLTR 
PVAEYFSKPN 



ARNVSCFLVR 
LEKNRIPFAN 
CFKTSSENPL 
ICIPNYLDQE 
ELAEINLLTH 
DTEGDQPTRV 
SPPRPNTQES 



HTPHPRRVCH 
CSYPPSTAVQ 
VIKKEEIKAK 
IKILAKLCSI 
HRRNTSMEPA 
SSQGSEENKE 
GSAKPVSARS 



IKGLNNIPIC 
KSPVRGMSPA 
RPPSPPKACS 
LHTDSLAEVL 
AETGKPPTVK 
VPKEAEHKPP 
IQEYNLCPQR 



BLASTP hits 



Entry A43427 from database PIR: 

neurofilament triplet HI protein - rabbit (fragment) 

Score - 118, P = 5.6e-04, identities = 79/290, positives = 110/290 

Entry RNNFH_1 from database TREMBL: 

Rat high molecular weight neurofilament (NF-H) protein mRNA, 3' end. 
Score = 115, P - 9.5e-04, identities = 69/281, positives ~ 100/281 

Entry B43427 from database PIR: 

neurofilament protein H form H2 (repetitive region) - rabbit (fragment) 
Score - 111, P = 1.3e-03, identities - 64/269, positives = 102/269 



Alert BLASTP hits for DKFZphtes3_2g7, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2g7, frame 3 



Report for DKFZphtes3_2g7 . 3 



[LENGTH] 


359 




[MW] 


39725.53 




[pi] 


9.45 




[PROSITEJ 


MYRISTYL 3 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


9 


[PROSITEJ 


PKC PHOSPHO~SITE 


10 


[PROSITEJ 


ASN_GLYCOSYLATION 


4 


[KW] 


Alpha Beta 




[KWJ 


LOW_COMPLEXITY 


4.18 % 



SEQ MNLNPPTSALQIEGKGSHIMARNVSCFLVRHTPHPRRVCHIKGLNNIPICTVNDDENAFG 

SEG 

PRD ccccccccceeecccccceeeeccceeeeecccccccccccccccccccccccccccccc 

SEQ TLWEVGQSNYLEKNRIPFANCSYPPSTAVQKSPVRGMSPAPNGAKVPPRPHSEPSRKIKE 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SEQ CFKTSSENPLVIKKEEIKAKRPPSPPKACSTPGSCSSGMTSTKNDVKANT ICIPNYLDQE 

SEG 

PRD hcccccccceeeehhhhhhccccccccccccccccccccccccccccceeeeccccchhh 

SEQ IKILAKLCSILHTDSLAEVLQWLLHATSKEKEWVSALIHSELAEINLLTHHRRNTSMEPA 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

SEQ AETGKPPTVKSPPTVKLPPNFTAKSKVLTRDTEGDQPTRVSSQGSEENKEVPKEAEHKPP 

SEG . . . .xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeecccccccceeeeccccccccccccccccccc 

SEQ LLIRRNNMKI PVAEYFSKPNSPPRPNTQESGSAKPVSARSIQEYNLCPQRACYPSTHRR 

SEG 

PRD eeeeccccccceeeeecccccccccccccccccccchhhhhhccccccccccccccccc 



Prosite for DKFZphtes3_2g7 .3 

PS00001 23->27 ASN_GLYCOSYLATION PDOC00001 

PS00001 80->84 ASN_GLYCOSYLATION PDOC00001 

PS00001 234->238 ASN GLYCOSYLATION PDOC00001 
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PS00001 


260- 


>264 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


232- 


■>236 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


115- 


>118 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


161- 


>164 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


207- 


>210 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


243- 


>246 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


248- 


>251 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


254- 


>257 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


262- 


■>265 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


332- 


>>335 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


337- 


>340 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


356- 


>359 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


51 


->55 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


61 


->65 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


124- 


>128 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


162- 


>166 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


195- 


>199 


CK2 PHOSPHO SITE 


PDOC00006 


PSOO006 


207- 


>211 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


235- 


>239 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


272- 


>276 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


340- 


>344 


ck22phospho~site 


PDOC00006 


PS00008 


153- 


>159 


MYRISTYL 


PDOC00008 


PS00008 


158- 


■>164 


MYRISTYL 


PDOC00008 


PS00008 


284- 


■>290 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_2g7 . 3) 
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DKFZphtes3_2hl 



group: transmembrane protein 

DKFZphtes3_2hl encodes a novel 116 amino acid protein with weak similarity to C. elegans 
cosmid C13F10. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 



similarity to C. elegans C13F10.5 
TRANSMEMBRANE 1 
Sequenced by EMBL 
Locus: /map«"2" 
Insert length: 1156 bp 

Poly A stretch at pos. 1143, polyadenylation signal at pos. 1121 



1 GGCCATCAAA ATAACTAAAC CATGTCATTT GGAGCAACAA AGCCACTGCG 
51 GCCTCCATTT GGGCCAAGCT CTGACTGCAA TGATGCCTCT GCCCCGACCC 
101 GGGCCTCGCT GTGACTGACA ATGCCGCTGC ATCTTTTCAG CAGTCATTGA 
151 TGAGGAAGTA TCTACATCCT CCTTCCCACT ACCAGATTTT GCTTGGAGAA 
201 AAGCAGTTTC CTGAAATAAT TCTGTGACGA GCTTCTTCCA CATTAGGACA 
251 AAAATGCTGG AAGCGGCTCA GCCCCAGGGC AGCACATCAG AGACACCATG 
301 GAACACAGCC ATTCCTCTGC CGTCGTGCTG GGACCAGTCT TTCCTGACCA 
351 ATATCACCTT CTTGAAGGTT CTTCTCTGGT TGGTCCTGCT GGGACTGTTT 
401 GTGGAACTGG AATTTGGCCT GGCATATTTT GTCCTGTCCT TGTTCTATTG 
4 51 GATGTACGTC GGGACACGAG GCCCTGAAGA GAAGAAAGAG GGAGAGAAGA 
501 GCGCCTACTC TGTGTTCAAT CCAGGCTGTG AAGCCATCCA GGGCACCCTG 
551 ACTGCAGAGC AGTTGGAGCG CGAGTTACAG TTGAGACCCC TGGCAGGGAG 
601 ATAGGACCCA GCTGTGCTGT CATGCAGCTA ACCTCTGATG TGGTCTTCCT 
651 CACCATTGGC TATGGATTTG ATTTCAGGTG TATAGGACTA AGGGCAGCTT 
701 GCGGGTTAGC TCTGTGACTG CATAGTTTTT CTACCTTCTT TCCCTGATCT 
751 TTTGCTGCCA TTTGATCTTT GATAGTTTTG GTGAAACTCT CTAAAATACA 
801 TTCACTGTGG GTCCGACGCA ATTTATAAAA ATTATGTACT CAAGAAGGGA 
851 GACCTGTTTG TTTCATTTCT CATCTGTTTG GGAGATGATT TTAGAGCACT 
901 AGAAAGGCAC TGGGGAGATT CTCAGCTTAA AACATCCAGC AGTTTGAAGT 
951 ATGATTAGGT ACATCAGGGC TGCATTGTCA ATGTTCTCTT TAAGTCTTTT 
1001 AACATTTATA GCAATTTTTT TTTTCCCGGA GAGTTTAGGT TGCAAGTTTT 
1051 GGGTTTCTTG TTTGTTTTTG TTTTGCTTCC TGCTTTAATT CTTTAATTTT 
1101 CAGTCATTAC TGGTATTGAA AAATAAAATA TCTTTAAAAC ATCAAAAAAA 
1151 AAAAAA 



BLAST Results 



Entry HS313307 from database EMBL: 
human STS SHGC-16715. 
Score - 1222, P = 1.4e-48, identities = 248/251 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 254 bp to 601 bp; peptide length: 116 
Category: similarity to unknown protein 



1 MLEAAQPQGS TSETPWNTAI PLPSCWDQSF LTNITFLKVL LWLVLLGLFV 
51 ELEFGLAYFV LSLFYWMYVG TRGPEEKKEG EKSAYSVFNP GCEAIQGTLT 
101 AEQLERELQL RPLAGR 
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BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl, frame 2 

TREMBL:CEUC13F10 2 gene: "C13F10.5"; Caenorhabditis elegans cosmicl 
C13F10., N - 1, Score - 141, P = 8.2e-l0 

>TREMBL:CEUC13F10 2 gene: "C13F10 . 5 W ; Caenorhabditis elegans cosmid 
C13F10. 

Length =171 

HSPs: 

Score «= 141 (21.2 bits), Expect - 8.2e-10, P = 8.2e-10 
Identities = 32/82 (39%), Positives = 52/82 (63%) 

Query: 27 DQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFVLSLFYWMYVGTRGPEEKKEGEKSAYS 86 

+QS ++ T + V++++V L ++FG +F+LSL + Y T G ++ GE SAYS 
Sbjct: 90 EQSVVS--TRIAVWYVVGQALAAWVQFGAVFFILSLILFTYWNT-G--RRRRGEMSAYS 144 

Query: 87 VFNPGCEAIQGTLTAEQLEREL 108 

VFN CE + G++TAE ER++ 
Sbjct: 145 VFN DN C E RL AG S MT AEH FE RDM 166 

Pedant information for DKFZphtes3_2hl, frame 2 



Report for DKFZphtes3_2hl . 2 



[LENGTH] 116 

[MW] 13092.19 

[pi] 4.64 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] AS N_G L YC OS YLAT I ON 1 

(KWl TRANSMEMBRANE 1 

[KW] LOW_C0MPLEXITY 32.76 % 



SEQ MLEAAQPQGSTSETPWNTAIPLPSCWDQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFV 

SEG xxxxxxxxxxxxxxxxxxxxx .... 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhchhhhh 

MEM MMMMMMMMMMMMMMMMM 



SEQ 
SEG 
PRD 
MEM 



LSLFYWMYVGTRGPEEKKEGEKSAYSVFNPGCEAIQGTLTAEQLERELQLRPLAGR 
XXXXXXXXXXXXXXXXX . . 

hhhhhhhhcccccchhhhhcccceeeecccccccccccchhhhhhhhhhccccccc 



Prosite for DKFZphtes3_2hl .2 

PS00001 33->37 AS N_G LYCOS YLAT I ON PDOC00001 

PS00006 10->14 CK2_PHOSPHO_SITE PDOC00006 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00007 78->86 TYR_PHOSPHO_SITE PDOC00007 

PS00007 77->86 TYR_PHOSPHO_SITE PDOC00007 

PS00008 97->103 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2hl.2) 
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DKFZphtes3_2hl5 



group: testes derived 

DKFZphtes3 2hl5 encodes a novel 855 amino acid protein with very weak similarity to S. pombe 
cdc23. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to cdc23 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4619 bp 

Poly A stretch at pos. 4598, polyadenylation signal at pos. 4589 

1 GAAGGCGTCC CGGCATCGGC CAAGATTCTA CATTGCTCAT CTGGGCATCT 
51 GAGCCTCCTT CGAAGTTTCC TGTCACAACT GTCCTCTTGA CAGCATGGAT 

101 GAGGAGGAAG ACAATCTGTC TCTGCTGACC GCACTGCTGG AAGAAAATGA 

151 GTCAGCCTTG GATTGTAATT CAGAAGAAAA TAACTTCTTG ACGCGGGAAA 

201 ATGGCGAGCC CGACGCATTT GATGAGCTCT TTGATGCCGA CGGCGACGGT 

251 GAATCTTATA CAGAAGAGGC TGATGATGGA GAAACAGGAG AGACAAGAGA 

301 CGAAAAGGAA AATCTGGCCA CTCTCTTTGG AGATATGGAG GACTTAACAG 

351 ATGAAGAAGA AGTTCCCGCA TCACAGTCAA CTGAAAATAG GGTCCTCCCT 

401 GCTCCTGCCC CCAGGCGAGA GAAAACGAAT GAAGAGTTGC AAGAGGAATT 

451 AAGGAATTTG CAAGAGCAAA TGAAGGCCTT ACAAGAGCAG CTAAAAGTAA 

501 CAACAATTAA ACAGACAGCA AGCCCAGCCC GTCTGCAAAA ATCCCCTGAG 

551 AAGTCTCCCC GGCCACCTCT TAAGGAGAGG AGAGTTCAGA GAATTCAGGA 

601 GTCAACATGC TTTTCTGCGG AGCTTGATGT CCCTGCGCTA CCAAGAACCA 

651 AGAGGGTGGC TCGAACACCA AAGCCTTCAC CTCCAGATCC CAAAAGCTCA 

701 TCTTCAAGGA TGACAAGTGC ACCCTCCCAA CCCCTACAGA CGATTTCTCG 

751 GAACAAACCT AGTGGGATAA CTAGAGGTCA AATTGTGGGG ACCCCAGGAA 

801 GTTCTGGGGA AACGACTCAA CCCATCTGTG TGGAAGCCTT CTCTGGTCTG 

851 CGGCTCAGGC GGCCTCGAGT ATCCTCCACA GAAATGAACA AGAAAATGAC 

901 CGGCCGAAAA CTGATCAGAC TGTCTCAGAT CAAGGAAAAG ATGGCCAGAG 

951 AGAAGCTGGA AGAAATAGAT TGGGTGACAT TTGGGGTTAT ATTGAAGAAG 
1001 GTTACGCCAC AGAGTGTGAA TAGTGGAAAA ACCTTCAGCA TATGGAAACT 
1051 GAATGATCTT CGTGACCTGA CACAATGTGT GTCCTTGTTC TTATTTGGAG 
1101 AAGTTCACAA AGCGCTCTGG AAGACGGAGC AGGGGACTGT CGTAGGGATC 
1151 CTCAATGCCA ACCCCATGAA GCCCAAGGAT GGTTCAGAGG AGGTGTGTTT 
1201 ATCTATCGAT CATCCTCAGA AGGTCTTAAT TATGGGTGAA GCTCTTGACC 
1251 TGGGAACCTG TAAAGCCAAG AAGAAGAATG GAGAGCCGTG CACGCAGACT 
1301 GTGAATTTGC GTGACTGTGA GTACTGTCAG TACCATGTCC AGGCTCAGTA 
1351 CAAGAAGCTC AGTGCAAAGC GTGCGGATCT GCAGTCCACC TTCTCTGGAG 
1401 GACGAATTCC AAAGAAGTTT GCCCGCAGAG GCACCAGCCT CAAAGAACGG 
1451 CTGTGCCAAG ATGGCTTTTA CTACGGAGGG GTTTCTTCTG CCTCGTATGC 
1501 AGCTTCAATT GCAGCAGCTG TGGCTCCTAA GAAGAAGATT CAAACCACTC 
1551 TGAGTAATCT GGTTGTTAAG GGCACAAACT TGATCATCCA GGAAACACGG 
1601 CAAAAACTCG GAATACCCCA GAAGAGCCTG TCTTGCTCTG AGGAGTTCAA 
1651 GGAACTGATG GACCTGCCGA CGTGTGGAGC CAGGAACTTA AAACAACATT 
1701 TAGCCAAAGC CTCAGCTTCA GGGATTATGG GGAGCCCAAA ACCAGCCATC 
1751 AAGTCCATCT CGGCCTCAGC ACTCTTGAAG CAACAGAAGC AGCGGATGTT 
1801 GGAGATGAGG AGAAGGAAAT CAGAAGAAAT ACAGAAGCGA TTTCTGCAGA 
1851 GCTCAAGTGA AGTTGAGAGC CCAGCTGTGC CATCTTCATC AAGACAGCCC 
1901 CCTGCTCAGC CTCCACGGAC AGGATCCGAG TTCCCCAGGC TGGAGGGAGC 
1951 CCCGGCCACA ATGACGCCCA AGCTGGGGCG AGGTGTCTTG GAAGGAGATG 
2001 ATGTTCTCTT TTATGATGAG TCACCACCAC CAAGACCAAA ACTGAGTGCT 
2051 TTAGCAGAAG CCAAAAAGTT AGCTGCTATC ACCAAATTAA GGGCAAAAGG 
2101 CCAGGTTCTT ACAAAAACAA ACCCAAACAG CATTAAGAAG AAACAAAAGG 
2151 ACCCTCAGGA CATCCTGGAG GTGAAGGAAC GTGTAGAAAA AAACACCATG 
2201 TTTTCTTCTC AAGCTGAGGA TGAATTGGAG CCTGCCAGGA AAAAAAGGAG 
2251 AGAACAACTT GCCTATCTGG AATCTGAGGA ATTTCAGAAA ATCCTAAAAG 
2301 CAAAATCAAA ACACACAGGC ATCCTGAAAG AGGCCGAGGC TGAGATGCAG 
2351 GAGCGCTACT TTGAGCCACT GGTGAAAAAA GAACAAATGG AAGAAAAGAT 
2401 GAGAAACATC AGAGAAGTGA AGTGCCGTGT CGTGACATGC AAGACGTGCG 
2451 CCTATACCCA CTTCAAGCTG CTGGAGACCT GCGTCAGTGA GCAGCATGAA 
2501 TACCACTGGC ATGATGGTGT GAAGAGGTTT TTCAAATGTC CCTGTGGAAA 
2551 CAGAAGCATC TCCTTGGACA GACTCCCGAA CAAGCACTGC AGTAACTGTG 
2601 GCCTCTACAA ATGGGAACGG GACGGAATGC TAAAGGTATG CCATTTGCGT 
2651 ACTAATTTTT GACTCCTTTT AGTGACCCAT GCTAATAATG TGGAACCATC 
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2701 TCCTATTAAA ATATTTTCAT TTTTCTAGGA AAAGACTGGT CCAAAGATAG 
27 51 GAGGAGAAAC TCTGTTACCA AGAGGAGAAG AACATGCTAA ATTTCTGAAC 
2801 AGCCTTAAAT AACCCGAACT TCAGACATTT TCCCACAGAC TTCCTGGCCT 
2851 CCTGTGACTC TGGAAAGCAA AGGATTGGCT GTGTATTGTC CATTGATTCC 
2901 TGATTGACGC CGTCAAAAAC AAATGCTTGT TAAGCCCATA AGCTTTGCCT 
2951 GCTTACTTTC TGCCATTGGG TTGGTTTGAT ACCACATTTA ACATTGACAT 
3001 TTAAGTGGAA AACCAAGTTA TCATTGTCTT TTCTAAGCTC AGTGTGGATG 
3051 ATTGCATTAC TTCATTCACT GAAGTTTTTG CCCAAAAATT GGAAGGTAAA 
3101 CAGAGAGCTA TGTTTCTGTA TCTTTTGGTT ATAGAGTGTT CACTTCTTTA 
3151 TCATAACAAA ATTCTAGTGT TTATACGAAC ACCCAGAGGC AAAAGAATTT 
3201 GGCTTAATTC TCACTCCAGG TAAGTAGCTT AACTTCTGGG CTTCAGTTTT 
3251 CTCATCTGTA AAATCAGGAA GATTGGACTA AGTGATCCTG AAATGTATTT 
3301 TTTAGCACTG GATTTCTACA AATAATAAAA CTTTCCCATC TAGATAATGA 
3351 TGATCACATA GTCTTGATGT ACGGACATTA AAAGCCAGAT TTCTTCATTC 
3401 AATTCTGTTA TCTCTGTTTT ACTCTTTGAA ATTGATCAAG CCACTGAATC 
3451 ACTTTGCATT TCAGTTTATA TATAGAGAGA GAAAGAAGGC TGTCTGCTCT 
3501 TACATTATTG TGGAGCCCTG TGATAGAAAT ATGTAAAATC TCATATTATT 
3551 TTTTTTTTAA TTTTTTTATT TTTTATGACA GGGTCTCACT ATGTCACCCT 
3601 GGCTGGAGTG CAGTAGTGCG ATCGCGGCAC ACTGCAGCCT TGGCTTCCCT 
3651 GGGCTCAAGC AGTCCTCCCA CCTCAGTCTC CCAAATAGCT AGGACTACAG 
3701 GCGTGCGTGA CCAAGCCCAG CTAATTTTTG CATTTTTTGT AGAGATGGGG 
3751 TTTTGCCATG TTGCTCAGGC TGGTCTCAAA CTCCTGAGCA CTAGCAATCC 
3801 ACCCACCTCT GTTTCCAAAA AAAAAAAAAA AATGAAAGGT CAACCCCTAT 
3851 GCAAATTACC ACAGCAAAGG TTTCATTCAG GAGATTCTTC CATCTGGGCA 
3901 ACCTGGTTTT CCAAATATCA TTTGACCTAA GTGAATGTTG ATACTAGCTA 
3951 AAGATTGGGT AAATTGGTTG AATTATTGTA TTGAAGCTTG AGCTGTAGCT 
4001 AAAAGTAATT TAGGTTTCCC CTAAGATGTT ATTATGTTAG GGACATAACA 
4051 CTTTTGGGAG GTTGTTGTGG GAGATGGTTG ATTTAGGTTT TCAAAAGCTA 
4101 GAAATAAAAT TTACATGCCT TAGATTTCAT AAAATTCTGC TCTAATTGGG 
4151 TGGAAGGTGC TGTATCTAAC TTGTGTTCCT CCTAAGGTTA TGTCCTAATA 
4201 ACTATTCTTT TAGGAGTATA CTTCTACTTT ATAGAAGGTT GCTTTTCTTT 
4251 TTAATTTTTT CTAACAAAGA AAAGAATAAA GTATTTATTA ATAAGAACCA 
4301 GAAAGCACTT GAAACTGATG TTTTTAATGG CTCATTTAGG GT AG ATT TAT 
4351 TTATCTCATT AACTTAAAAC AGCTATGTGT ATGAAATAGG TCACAACAGA 
4401 ACTTGAACAC CAGGTTGGTG TCTGAGCAAT CCCTTTCTTA TGGGAAAAAC 
4451 AATGTTCTTG TTTGAACAGA GGGTATCATT GCAGTCAGTA TTCACGTGTA 
4501 TATTGTTATA TAAGTTGTAT AATATGCTTG TAAAGGCTGA GGGTGAGCTG 
4551 TATCTGGATG CCTTTTTACA ATTTGATTTT AACTTTTAAA ATAAATTTAA 
4601 AACATAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 95 bp to 2659 bp; peptide length: 855 
Category: similarity to known protein 
Classification: Cell division 



1 MDEEEDNLSL LTALLEENES ALDCNSEENN FLTRENGEPD AFDELFDADG 
51 DGESYTEEAD DGETGETRDE KENLATLFGD MEDLTDEEEV PASQSTENRV 
101 LPAPAPRREK TNEELQEELR NLQEQMKALQ EQLKVTTIKQ TAS PARLQKS 
151 PEKSPRPPLK ERRVQRIQES TCFSAELDVP ALPRTKRVAR TPKPSPPDPK 
201 SSSSRMTSAP SQPLQTISRN KPSGITRGQI VGTPGSSGET TQPICVEAFS 
251 GLRLRRPRVS STEMNKKMTG RKLIRLSQIK EKMAREKLEE IDWVTFGVIL 
301 KKVTPQSVNS GKTFSIWKLN DLRDLTQCVS LFLFGEVHKA LWKTEQGTVV 
351 GILNANPMKP KDGSEEVCLS IDHPQKVLIM GEALDLGTCK AKKKNGEPCT 
401 QTVNLRDCEY CQYHVQAQYK KLSAKRADLQ STFSGGRIPK KFARRGTSLK 
451 ERLCQDGFYY GGVSSASYAA SIAAAVAPKK KIQTTLSNLV VKGTNLIIQE 
501 TRQKLGIPQK SLSCSEEFKE LMDLPTCGAR NLKQHLAKAS ASGIMGSPKP 
551 AIKSISASAL LKQQKQRMLE MRRRKSEEIQ KRFLQSSSEV ESPAVPSSSR 
601 QPPAQPPRTG SEFPRLEGAP ATMTPKLGRG VLEGDDVLFY DESPPPRPKL 
651 SALAEAKKLA AITKLRAKGQ VLTKTNPNSI KKKQKDPQDI LEVKERVEKN 
701 TMFSSQAEDE LEPARKKRRE QLAYLESEEF QKILKAKSKH TGILKEAEAE 
751 MQERYFEPLV KKEQMEEKMR NIREVKCRW TCKTCAYTHF KLLETCVSEQ 
801 HEYHWHDGVK RFFKCPCGNR SISLDRLPNK HCSNCGLYKW ERDGMLKVCH 
851 LRTNF 
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BLAST? hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_2hl5, frame 2 

TR£MBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10" ; product: "cell 
division cycle protein 23"; S.pombe chromosome II cosmid cl347., N = 
2, Score - 284, P - 7e-21 

PIR:S483B4 DNA43 protein - yeast (Saccharomyces cerevisiae), N = 2, 
Score = 203, P = 7e-12 

TREMBL:SCDNA52A_1 gene: "DNA52"; Saccharomyces cerevisiae DNA52 gene, 
complete cds., N « 2, Score = 201, P = 7.9e-12 

TREMBLNEW:AC006234_6 gene: "F5H14.6"; Arabidopsis thaliana chromosome 
II BAC F5H14 genomic sequence, complete sequence., N = 2, Score = 211, 
P = 1.7e-15 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae), N = 2, 
Score = 203, P - 7.2e-12 



>TREMBLNEW:SPBC1347_10 gene: M cdc23"/ "SPBC1347 . 10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid C1347. 
Length - 593 



HSPs: 



Score - 284 (42.6 bits), Expect = 7.0e-21, Sum P(2) - 7.0e-21 
Identities = 97/383 (25%), Positives = 186/383 (48%) 



Query: 109 EKTNEELQEELRNLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQ 168 

E+ + +L+E + LQ Q+ +QE+ ++ + ++ AS + + PR P ++ RV + 
Sbjct: 8 EENDLDLEE — KRLQRQLNEIQEKKRLRSAQKEASSENAEVI--QVPRSPPQQVRVLTVS 63 

Query: 169 ESTCFSAE LDVPALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQP LQTIS 218 

+ + L + K V+ P P PK R+ A +Q L+T+ 

Sbjct: 64 SPSKLKSPKRLILGIDKGKTGKDVSLGKGPRGPLPKPFHERLAEARNQERKRSDKLKTMK 123 

Query: 219 RNKPSGITRGQIVGTPGSSGETTQPI-C--VEAFSGLRLRRPRVSSTEMNKKMTGRKLIR 275 

+N+ R + + G S E P+ C ++ +S + +S + + G ++ 

Sbjct: 124 KNRKQSFQRKRNILEDGKSEEEKFPMKCDEIDPYSRQAIVIRYISDEVAKENIGGNQVYL 183 

Query: 276 LSQIKEKMAREKLE— EID-WVTFGVILKKV-TPQSVNSGKTFSIWKLNDLRDLTQCVSL 331 

+ Q+ + + K E E+D +V G++ T ++VN K + + L DL+ +C 
Sbjct: 184 IHQLLKLVRAPKFEAPEVDNYVVMGIVASNSGTRETVNGNK-YCMLTLTDLKWQLEC 239 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



332 FLFGEVHKALWKTEQGTWGILNANPMKPKDGS-EEVCLSIDHPQKVLI-MGEALDLGTC 389 

FLFG+ + WK + GTV+ +LN +KPK+ L +D VL+ +G + LG C 

240 FLFGKAFERYWKIQSGTVIALLNPEVLKPKNPDIGRFSLKLDSEYDVLLEIGRSKHLGYC 299 

390 KAKKKNGEPCTQTVNLRDCEYCQYHVQAQYKKLSAKRADLQSTFSGGRIPKKFARRGTSL 44 9 

+++K+GE C ++ R + C+YHV ++ + R + S+ + P+ ARR 
300 SSRRKSGELCKHWLDKRAGDVCEYHVDLAVQRSMSTRTEFASSMATMHEPR--ARR 353 



450 KERLCQDGF — Y YGGVS S AS YAAS I AAAVAPKKKI QT 484 
++R GF Y+ G ++ ++A + +QT 

Sbjct: 354 E KRFRGQGFQG Y F AG E K Y S AI P N AV AGL Y DAE D AVQT 390 

Score = 41 (6.2 bits), Expect « 7.0e-21, Sum P(2) = 7.0e-21 
Identities - 12/43 (27%), Positives = 17/43 (39%) 



Query: 453 LCQDGFY YGGVSSAS YAAS I AAAVAPKKKI QTTLSNLVVKGTN 495 

L +D S AS A++ K + SN + GTN 

Sbjct: 465 LSKDSEIDSSTKKPSVLASFNASIMNPKSSLPSFSNSAILGTN 507 

Score - 40 (6.0 bits), Expect = 8.9e-21, Sum P<2) = 8.9e-21 
Identities = 13/26 (50%), Positives - 18/26 (69%) 

Query: 536 LAKASASGIMGSPKPAIKSISASALL 561 

LA +AS IM +PK ++ S S SA+L 
Sbjct: 481 LASFNAS-IM-NPKSSLPSFSNSAIL 504 



Pedant information for DKFZphtes3_2hl5, frame 2 



Report for DKFZphtes3_2hl5 . 2 
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[LENGTH] 855 

(MWJ 96135.01 

(pi] 8.96 

(HOMOLJ TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid cl34*7. 5e-16 

(FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YILlSOc) le-11 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YILlSOc) le-11 

( FUNCAT J 30.10 nuclear organization IS. cerevisiae, YILlSOc] le-11 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 12.05 % 

[KW] COILED_COIL 4.21 % 

SEQ MDEEEDNLSLLTALLEENESALDCNSEENNFLTRENGEPDAFDELFDADGDGESYTEEAD 

5 EG xxxxx 

PRD cccchhhhhhhhhhhhhhhhccccccccceeeeccccccccceeeecccccccceeeeec 

COILS 

SEQ DGETGETRDEKENLATLFGDMEDLTDEEEVPASQSTENRVLPAPAPRREKTNEELQEELR 

SEG xxxxxxxxxxxx xxxxxxxxx 

PRD cccccccccccchhhhhhcccccccceeeccccccccccccccccccchhhhhhhhhhhh 

COILS CCCCCCCCCCCCCC 

SEQ NLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQESTCFSAELDVP 

SEG xxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccceeeeecccccccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCC 

SEQ ALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQPLQTISRNKPSGITRGQIVGTPGSSGET 

SEG xxxxxxxxxxxxx 

PRD cccccceeeecccccccccccchhhhhhhccccchhhhhhccccccceeeeecccccccc 

COILS 

SEQ TQPICVEAFSGLRLRRPRVSSTEMNKKMTGRKLIRLSQIKEKMAREKLEEIDWVTFGVIL 

SEG 

PRD cccccccccchhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeeeee 

COILS 

SEQ KKVTPQSVNSGKTFSIWKLNDLRDLTQCVSLFLFGEVHKALWKTEQGTWGILNANPMKP 

SEG 

PRD cccccccccccceeeeeeeccchhhhhhheeeeecchhhhhhhhccceeeeecccccccc 

COILS 

SEQ KDGSEEVCLSIDHPQKVLIMGEALDLGTCKAKKKNGEPCTQTVNLRDCEYCQYHVQAQYK 

SEG 

PRD ccccceeeeecccccceeeccccccccccccccccccccceeecccccccchhhhhhhhh 

COILS 

SEQ KLSAKRADLQSTFSGGRIPKKFARRGTSLKERLCQDGFYYGGVSSASYAASIAAAVAPKK 

SEG xxxxxxxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhhhccccccccccccccchhhhhhhccccccccccchhhhhhhhhhhhcch 

COILS 

SEQ KIQTTLSNLVVKGTNLIIQETRQKLGIPQKSLSCSEEFKELMDLPTCGARNLKQHLAKAS 

SEG 

PRD hhhhhhheeecccceeeehhhhhhhcccccccchhhhhhhhhhccccccchhhhhhhhhh 

COILS 

SEQ ASGIMGSPKPAIKSISASALLKQQKQRMLEMRRRKSEEIQKRFLQSSSEVESPAVPSSSR 

SEG xxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 

COILS 

SEQ QPPAQPPRTGSEFPRLEGAPATMTPKLGRGVLEGDDVLFYDESPPPRPKLSALAEAKKLA 

SEG xxxxxxxx xxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeeccccccchhhhhhhhhhhhh 

COILS 

SEQ AITKLRAKGQVLTKTNPNSIKKKQKDPQDILEVKERVEKNTMFSSQAEDELEPARKKRRE 

SEG xxxxx 

PRD hhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhh 

COILS 

SEQ QLAYLESEEFQKILKAKSKHTGILKEAEAEMQERYFEPLVKKEQMEEKMRNIREVKCRW 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheee 

COILS 

SEQ TCKTCAYTHFKLLETCVSEQHEYHWHDGVKRFFKCPCGNRSISLDRLPNKHCSNCGLYKW 



801 



WO 01/12659 PCT/IB00/01496 



SEG 

PRD eeecceeeeeeecccceeeccccccccceeeeeecccccccccccccccccccccceeec 

COILS 

SEQ ERDGMLKVCHLRTNF 

SEG 

PRD ccccccccccccccc 

COILS 



(No Prosite data available for DKFZphtes3_2hl5 .2) 
(No Pfaro data available for DKF2phtes3_2hl5 .2) 
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DKFZphtes3_2i5 



group: testes derived 

DKFZphtes3_2i5 encodes a novel 151 amino acid protein with weak similarity to. C.elegans 
cosmid F20D12.3 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to C.elegans F20D12.3 

many ATGs in front of the start of the ORF, 
unspliced intron in 5 1 region? 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2142 bp 

Poly A stretch at pos. 2121, polyadenylation signal at pos. 2102 



1 GCAGTAAATA TGATATGAAA GAATTCTCTA ACTTGGGGGT GGCTTGTAAC 
51 CTGTAATAAA AATATTGCTA AAATACCTTC TCTCACTTTG AAAAAGCATC 
101 TGAGCAATCC TCAGTTATTG GTGAATTCTT ACCAGTGTTT AATTCCTCTC 
151 TTTCCGTTAT GGTCTTAGTG TGGTTGTCCT GGTGTAGTAT TTCAAGAGGA 
201 ACCTGCAGCA AGATGAAAAG AGAGTGGGAC TTGGAGCTAA GAACGTTTTT 
251 GGCTTTAAGT GCTACGTTAA CTCATTAAAT TCTTAGTGAT CTTGGGGAAG 
301 TCCCCTCACC AGTGTGAGCC TCAGTTTTCT TATCTAATAA GTAAGGATAA 
351 TCTTACCCAC CTTATTGCGG GGGCCCGAGG ATTACATGAT TGGTGTAACA 
401 GTAGCACCTT GTACATTTGA AAGGACTAAT ACCAGTGGAC TTTAACCTTG 
451 GCTGGGCTTT GGAATTCTTG GTGGGACTTT TTAATCATGT AGATTCTCAG 
501 GCCCCTGCCT GGCCTGTGGA ACCACAGACT CTATAGGTGG GCCCTTCCAG 
551 AAGGCCTCAT GGGTGGTTCT CATGTGGAAC CTGTGTTGCA AGCCACTGCA 
601 TGGTGTTACT GCTATTAACA TTAAAACTTA TATTTTCCTT ATTGTGTGGA 
651 TATATCTGTG GTGTTTGCCC ATGTATACTT CATTTTACAT TTCTTAAAGA 
701 ATAGAATGGA ATGGTTTTAA GCACGCTACA TTGTCCAGGT TATACCCACA 
751 GAAGAGCTGT TGTGTAACAG AATCAGCATC ATACCTGAAT CATTTGTACA 
801 TTGCATATAA GACTATGTCT AAGTAGAAGA TGCTATGAAA TCATGTCTGC 
851 TGTGGGGCCA GGCATAATTA TGAATGTTAC TTAAGAGCAT AGGTGAGGTG 
901 AGAAAAGGGA ATGTGACTAG TGTTTTAGTA TTTTCTTGGT GTGGGATGAA 
951 GTATAATTCT TTTTTTTTTT TCTCAACAAA GCAGTAAAAC TAGAAAGAAG 
1001 GAGAACTCTT CCCTCAAGAA TGGCTGTACC TTCATATCTA GAGGCACATT 
1051 AAAAAAAAGA ACGTCTGTAC CTTAAAAATG GAGGTCATTT CATTGTGTTC 
1101 ATTTTCAAGG TTGTTGTATG GCTCGGTCAG AACTTTCTGT TACCAGAAGA 
1151 CACTCACATT CAGAATGCTC CATTTCAAGT GTGTTTCACA TCTTTACGGA 
1201 ATGGCGGCCA CCTGCATATA AAAATAAAAC TTAGTGGAGA GATCACTATA 
1251 AATACTGATG ATATTGATTT GGCTGGTGAT ATCATCCAGT CAATGGCATC 
1301 ATTTTTTGCT ATTGAAGACC TTCAAGTAGA AGCGGATTTT CCTGTCTATT 
1351 TTGAGGAATT ACGAAAGGTG CTAGTTAAGG TGGATGAATA TCATTCAGTG 
1401 CATCAGAAGC TCAGTGCTGA TATGGCTGAT CATTCTAATT TGATCCGAAG 
1451 TTTGCTGGTC GGAGCTGAGG ATGCTCGTCT GATGAGGGAC ATGAAAACAA 
1501 TGAAGAGTCG TTATATGGAA CTCTATGACC TTAATAGAGA CTTGCTAAAT 
1551 GGATATAAAA TTCGCTGTAA CAATCACACA GAGCTGTTGG GAAACCTCAA 
1601 AGCAGTAAAT CAAGCAATTC AAAGAGCAGG TCGTCTGCGG GTTGGAAAAC 
1651 CAAAGAACCA GGTGATCACT GCTTGTCGGG ATGCAATTCG AAGCAATAAC 
1701 ATCAACACAC TGTTCAAAAT CATGCGAGTG GGGACAGCTT CTTCCTAGGT 
1751 GAGGAAAATA CAGGTCATGA AGTTCCTGGC AAAGATTTTC TGTTAAAAAC 
1801 CTATGCTGGT TTGCTTTGGA TCACACCCTG GTGAACCCCG GGTGCTAAGA 
1851 ATGAAAATAA CCTTGGTGAG TTGTACAAAT TAAAGACAAA GAACTACATG 
1901 TGAAGATAGA CTTGCTTTCT ATTTTTAAAT CAGTAGTAGT ACTGTTGCTG 
1951 AATAATACTA GGTTTTTATG GAATAGGATG AATGCTTTTG AAGTATTAGG 
2001 GCTTCAGAGT CCAATTTTGC TTATTTATGG TATATAAATA CATATTTTTT 
2051 TCTTGAAATT GCAATTGAGT TTGTACTTTT CAAATAGATT ATCTACTTTT 
2101 TCATTAAAAT GTAAAGATGT TAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 1293 bp to 1745 bp; peptide length: 151 
Category: similarity to unknown protein 
Classification: no clue 

1 MASFFAIEDL QVEADFPVYF EELRKVLVKV DEYHSVHQKL SADMADHSNL 

51 IRSLLVGAED ARLMRDMKTM KSRYMELYDL NRDLLNGYKI RCNNHTELLG 

101 NLKAVNQAIQ RAGRLRVGKP KNQVITACRD AIRSNNINTL FKIMRVGTAS 
151 S 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2i5, frame 3 

TREMBL : CEF2 0D12_1 gene: "F20D12 . 3"; Caenorhabditis elegans cosmid 
F20D12., N =- 1, Score = 173, P = 4.5e-12 



>TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 
Length - 699 

HSPs: 

Score » 173 (26.0 bits), Expect = 4.5e-12, P - 4.5e-12 
Identities = 33/130 (25%), Positives = 72/130 (55%) 

Query: 20 FE EL RK VL V K V DE Y H S VHQK L S ADMADH S N L I RS LL VG A E DA RLMRDMKTMKS R YME L Y D 79 

F+E ++L ++D V +L+A++ + ++ +++ AED+ + ++ + Y+ L 
Sbjct: 569 FKEADEILEEIDPMTEVRDRLTAELQERQAAVKEIIIRAEDSIAIDNIPDARKFYIRLKA 628 

Query: 80 LNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKPKNQVITACRDAIRSNNINT 139 

+ ++R NN + +L+ +N+ 1+ RLRVG+P Q++ +CR AI +N 

Sbjct: 629 NDAAARQAAQLRWNNQERCVKSLRRLNKIIENCSRLRVGEPGRQI VVSCRSAIADDNKQI 688 

Query: 140 LFKIMRVGTA 149 

+ KI++ G + 
Sbjct: 689 ITKILQYGAS 698 



Pedant information for DKFZphtes3_2i5, frame 3 



Report for DKFZphtes3_2i5 . 3 



[LENGTH] 151 

[MW] 17304.07 

[pU 9.33 

[HOMOL] TREMBL :CEF20D12_1 gene: "F20D12 . 3"; Caenorhabditis elegans cosmid F20D12. 2e 

[KW] Alpha_Beta 



SEQ MASFFAIEDLQVEADFPVYFEELRKVLVKVDEYHSVHQKLSADMADHSNLIRSLLVGAED 

PRD ccceeeehhhhhhccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ ARLMRDMKTMKSRYMELYDLNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKP 

PRD hhhhhccchhhhhheeeccchhhhhhheeeeeccchhhhhhhhhhhhhhhhhcccccccc 

SEQ KNQVITACRDAIRSNNINTLFKIMRVGTASS 

PRD cceeeeehhhhhhcccceeeeccceeecccc 



(No Prosite data available for DKFZphtes3_2i5 . 3) 
(No Pfam data available for DKFZphtes3_2i5 . 3 ) 
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DKFZphtes3_2119 
group: testes derived 

DKFZphtes3_2119 encodes a novel 166 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

complete cDNA, complete cds, no EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1079 bp 

Poly A stretch at pos. 1053, polyadenylation signal at pos. 1038 

1 CCACAGGACA CACTGTTCCC AGGGCACAGA CACCCTGGGC TTTGGTTGGG 
51 TCTTGGCCTC CAGGTAGGGC CCTGTTGGGC AGCGGGCAGC AACTCCTGAG 

101 ACACTACTGT GATTCTTGGT GGTGGCTGTG GTAAAAAACC TGCAGGGCTA 

151 GAGTTTGGGG TGAGATTCAG CAGTAACTGT GGCCTCTCCT AGTGACAGTA 

201 TGTCACTCCC ACTCCCAGCA CGCATGCCCA CAGGCCACGG CCTCCACATC 

251 ACAAACCCCC CACCAAGTTG CCCATCTATG GAGCAGCTCC CATACGGCAG 

301 GGTCAGGCTC TTACCTCCAC CTCCAGGGCA CAGACAGGGG GAGCTCTGTC 

351 TCACTGTAAG GCAATGAGGA GAGTTGAGGG CCCAGACCAG GCTAGGGGCC 

401 ATCCCCTTTC CCGAGCAGGC CTCAGGGAAG GACCAGCCCC ATTCCCATCT 

451 GACCTAGGTC TTAGCCCAGG AGCCTGCATA GGGAAGAAAG GACAGACAGG 

501 GCCTCCTTAC TGGCTGACAC TCAGGAGGGG CTGGGGCAAG AGAGCAGAGG 

551 GAGCGCAGGG CCAGGCAGGG GCTGCTGAGG ATCCATGGGA GCTCAGGGTG 

601 CACAAGGGGG CTGCCCTTCC TGGGCTGCAG GCAGCATCCC TATGGGAGCT 

651 GAGAAAGTCC AATCCTGAGA TGGGACAGTG CTGCCCAGGG GTGTGTGGCT 

701 GGGCCCTGAC AACAGTCTCC CCAAAAGTGA CCACATCACC AGGCTCAGTT 

7 51 CCAGGAAGGC TGAGAAGTGC CCAGTACACT GAGGATGCAC CTCAGTTACA 

801 TAAAATAAAT GAAACTGGAG TACTAACGTA CAGTTTAAAG GTTATAGTTA 

851 CTATTTTTAT ATGATATACT AGTAATTTTT GAATAGGGTA AACTTTAGGT 

901 GTTTTGACAC CAAAAGAAAA CTACATGAGT TCATGCATGT GTTAAATTGC 

951 TTTACTGTAG TAATCATTTA CATGTATATG TATATATGAA TATAATTATG 
1001 GGCTCATTAA ATTTAAATAT TATAAATAGG TGACAAAGAA TAAAGTTAAC 
1051 TGGAAAAAAA AAAAAAAAAA AAAAAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 1 

ORF from 364 bp to 861 bp; peptide length: 166 
Category: putative protein 
Classification: no clue 

1 MRRVEGPDQA RGHPLSRAGL REGPAPFPSD LGLSPGACIG KKGQTGPPYW 
51 LTLRRGWGKR AEGAQGQAGA AEDPWELRVH KGAALPGLQA ASLWELRKSN 
101 PEMGQCCPGV CGWALTTVSP KVTTSPGSVP GRLRSAQYTE DAPQLHKINE 
151 TGVLTYSLKV IVTIFI 



BLAST P hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_2119, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2119, frame 1 

Report for DKFZphtes3__2119 . 1 



(LENGTH) 166 

(MWJ 17691.35 

IpIJ 9.54 

[KWJ All_Beta 

[KW] LOW_COMPLEXITY 7.23 % 



SEQ MRRVEGPDQARGHPLSRAGLREGPAPFPSDLGLSPGACIGKKGQTGPPYWLTLRRGWGKR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccceeeeecccccc 

SEQ A EGAQGQAG AAE D PW E LRV HKG AAL PGLQAAS LWELRKS N PEMGQC CPG VCGWALTT VS P 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccceeeeccccccccchhhhhhhhhhcccccccccccccceeeeeccc 

SEQ KVTTSPGSVPGRLRSAQYTEDAPQLHKINETGVLTYSLKVIVTIFI 

SEG 

PRD ccccccccccccccccccccccccceeeccccceeeehhhhhhccc 



(No Prosite data available for DKFZphtes3_2119 . 1 ) 
(No Pfam data available for DKFZphtes3_2119. 1) 
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DKFZphtes3_2ml8 



group: nucleic acid management 

DKFZphtes3_2ml8 encodes a novel amino acid protein, with similarity to mouse Dhml . 

The protein seems to play a role in nucleotide metabolism, RNA metabolism, but also in DNA 
repair and cell cycle. The yeast homologue is a DNA strand exchange protein required for 
sporulation and homologous recombination. 

The novel protein can find application as multifunctional nuclease / exoribonuclease . 



nearly identical to mouse Dhml 

complete cDNA, complete cds, start at Bp 42, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3022 bp 

Poly A stretch at pos. 3004, polyadenylation signal at pos . 2981 



1 CTCGTCAGCC GGTCGGCCGC 
51 CCGGCGTTCT TCCGCTGGCT 
101 CTGCGTGGAA GAGAAGCCAA 
151 ATGCCAGTAA ACCTAATCCA 
201 GATATGAATG GAATCATCCA 
251 ACCAAAAAAT GAAGATGAAA 
301 GACTTTTCAG TATTGTAAGA 
351 GGAGTGGCAC CACGTGCTAA 
401 GGCATCAAAA GAAGGAATGG 
451 AAGAAATATT GGCAAAAGGT 
501 AGATTTGACA GCAACTGTAT 
551 TGCTAAATGC CTTCGCTATT 
601 GGTGGAAAAA TTTGACAGTT 
651 GGAGAACATA AAATCATGGA 
701 CCATGACCCA AATACTCATC 
751 TTATGCTTGG CCTTGCCACA 
801 GAATTCAAAC CAAACAAGCC 
851 ACATGAGGTC AAAGATTGTG 
901 ATGATGAACT TGCCGATAGT 
951 CTTCGGCTTA ATGTTCTTCG 
1001 CAGCCTACCA TTCACATTTG 
1051 TCATGTGCTT CTTTGTGGGA 
1101 GAGATTAGGG AAAATGCAAT 
1151 GGTACACAAA ACTGGGGGTT 
1201 AAAGAGTACA GATGATCATG 
1251 TTTAAAAAGA GAAAGGATGA 
1301 AAAAAGAAAG AGAATGAAGA 
1351 TATTAACTCC TCATGCCTTG 
1401 GCCAGTAATC CGAGACAAGC 
1451 TAGTCCTTCG ATATCTCCTA 
1501 CTCCATTAGG AGGAATTAAG 
1551 GAGCCAGAGG ATAATGTCAG 
1601 CTACAAGAAC AAATTTGATG 
1651 AAGTTGTGCA GTCGTACGTT 
1701 TACCAGGGCT GTGCTTCCTG 
1751 ATTTGCTTCA GACTTTGAAG 
1801 AGGGTACGAA ACCGTTTAAA 
1851 GCTGCAAGTG GTAATTTTCT 
1901 TCCTGATTCT AGTATAATTG 
1951 TGAATGGGAA GAAATATGCA 
2001 GATGAGCGAA GGCTACGAGC 
2051 TCCAGAAGAG ACCAGAAGAA 
2101 GGAAACATCA CCCACTCCAT 
2151 TCCACAGAGC CAGTGGAGGT 
2201 AAAGTTTTCT TTGGATGAAG 
2251 CTCCTGTTCC TATGTTAAGG 
2301 AATTTTAAAG ACCCACAGTT 
2351 GCTTCCAGGA GCAAGAAAGC 
2401 AAAAATCCAG CAATGGACGG 
2451 GACCGGAGGC CTGTGCACCT 
2501 TGTGATGCCA AGAGGCTCAG 
2551 CACCTGTGAC TTACCAGGGA 
2601 GCCCAGATTC CAAAACTTAT 
2651 AGGTCCTCCT CCCCTTTTCC 



CGCCTCCAGC CGTGTGCCGC TATGGGAGTC 
CAGCCGCAAG TACCCGTCCA TCATAGTCAA 
AAGAATGCAA TGGTGTAAAG ATTCCAGTTG 
AATGATGTGG AGTTTGATAA TCTGTATTTG 
TCCCTGTACT CATCCTGAAG ACAAACCAGC 
TGATGGTTGC AATTTTTGAG TACATTGACA 
CCAAGAAGAC TTCTCTACAT GGCAATAGAT 
AATGAACCAG CAGCGTTCAA GGAGGTTCAG 
AAGCAGCAGT CGAGAAGCAG CGAGTCAGGG 
GGCTTTCTTC CTCCAGAAGA AATAAAAGAA 
TACACCAGGA ACTGAATTCA TGGACAATCT 
ACATAGCTGA TCGTTTAAAT AATGACCCTG 
ATTTTATCTG ATGCTAGTGC TCCTGGTGAA 
TTACATTAGA AGGCAAAGAG CCCAGCCTAA 
ATTGTTTATG TGGAGCAGAT GCTGATCTCA 
CATGAACCGA ACTTTACCAT TATTAGAGAA 
CAAACCATGT GGTCTTTGTA ATCAGTTTGG 
AAGGTTTGCC AAGAGAAAAG AAGGGAAAGC 
CTTCCTTGTG CAGAAGGAGA GTTTATCTTC 
TGAGTATTTG GAAAGAGAAC TCACAATGGC 
ATGTTGAGAG GAGCATTGAT GACTGGGTTT 
AATGACTTCC TCCCTCATTT GCCATCGTTA 
TGACCGTTTG GTTAACATAT ACAAAAATGT 
ACCTTACAGA AAGTGGTTAT GTCAATCTGC 
TTAGCAGTTG GTGAAGTTGA GGATAGCATT 
TGAGGACAGT TTTAGAAGAC GACAGAAAGA 
GAGATCAACC AGCTTTCACT CCTAGTGGAA 
GGTTCAAGAA ATTCACCAGG TTCTCAAGTA 
AGCCTATGAA ATGAGGATGC AGAATAACTC 
ATACGAGTTT CACATCTGAT GGCTCCCCGT 
CGAAAAGCAG AAGACAGTGA CAGTGAACCT 
GTTATGGGAA GCTGGCTGGA AGCAGCGGTA 
TGGATGCAGC TGATGAGAAA TTCCGTCGGA 
GAAGGACTTT GCTGGGTTCT TAGATATTAT 
GAAGTGGTAT TATCCATTTC ATTATGCACC 
GCATTGCAGA CATGCCATCT GATTTTGAGA 
CCACTAGAAC AACTTATGGG GGTATTTCCA 
ACCTCCATCA TGGCGGAAGC TCATGAGTGA 
ACTTCTATCC TGAAGATTTT GCTATTGATT 
TGGCAAGGTG TTGCTCTCTT GCCATTCGTG 
TGCCCTAGAA GAGGTATACC CAGACCTCAC 
ACAGCCTTGG AGGTGATGTC TTATTTGTGG 
GACTTCATTT TAGAGCTGTA CCAGACAGGT 
ACCCCCTGAA CTATGTCATG GGATTCAAGG 
AAGCCATTCT TCCAGATCAA ATAGTATGTT 
GATCTGACAC AGAACACTGT AGTCAGTATT 
TGCTGAAGAT TACATTTTTA AAGCTGTAAT 
CAGCAGCAGT ACTGAAACCT AGTGACTGGG 
CAGTGGAAGC CTCAGCTTGG CTTTAACCGT 
GGATCAGGCA GCCTTCAGGA CTTTGGGCCA 
GAACTGGCAT TTACAGCAAT GCTGCACCAC 
AACTTATACA GGCCGCTTTT GAGAGGACAA 
GTCAAATATG AGGCCCCAGG ATTCCTGGCG 
AGCAGCAAAG GTTTGACAGA GGCGTTGGGG 
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2701 CTGAACCTCT GCTCCCATGG AACCGGATGC TGCAAACCCA GAATGCAGCC 

2751 TTCCAGCCAA ACCAGTACCA GATGCTAGCT GGGCCTGGTG GGTATCCACC 

2801 CAGACGAGAT GATCGTGGAG GGAGACAGGG ATATCCCAGA GAAGGAAGGA 

2851 AATACCCTTT GCCACCACCC TCAGGAAGAT ACAATTGGAA TTAAGCTTTT 

2901 GTAAAGCTTT CCCAAATCCT TTCATCATTC TACAGTTTTA TGCTATTTGT 

2951 GGAAAGATTT CCTTCTCAAG TAGTAGTTTT TAATAAAACT ACAGTACTTT 

3001 GTGTAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



95192042: 

Characterization of cDNA encoding mouse homolog of fission yeast dhpl+ 
gene: structural 

and functional conservation. 

97361754: 

Cloning and characterization of mouse Dhm2 cDNA, a functional homolog 
of budding yeast 
SEP1. 



Peptide information for frame 3 



ORF from 42 bp to 2891 bp; peptide length: 950 
Category: strong similarity to known protein 



1 MGVPAFFRWL SRKYPSIIVN 

51 LYLDMNGIIH PCTHPEDKPA 

101 AIDGVAPRAK MNQQRSRRFR 

151 IKERFDSNCI TPGTEFMDNL 

201 PGEGEHKIMD YIRRQRAQPN 

251 IREEFKPNKP KPCGLCNQFG 

301 FIFLRLNVLR EYLERELTMA 

351 PSLEIRENAI DRLVNIYKNV 

401 DSIFKKRKDD EDSFRRRQKE 

451 SQVASNPRQA AYEMRMQNNS 

501 SEPEPEDNVR LWEAGWKQRY 

551 RYYYQGCASW KWYYPFHYAP 

601 VFPAASGNFL PPSWRKLMSD 

651 PFVDERRLRA ALEEVYPDLT 

701 QTGSTEPVEV PPELCHGIQG 

751 VSINFKDPQF AEDYIFKAVM 

801 FNRDRRPVHL DQAAFRTLGH 

851 RGQAQIPKLM SNMRPQDSWR 

901 NAAFQPNQYQ MLAGPGGYPP 



CVEEKPKECN GVKIPVDASK PNPNDVEFDN 
PKNEDEMMVA IFEYIDRLFS IVRPRRLLYM 
ASKEGMEAAV EKQRVREEIL AKGGFLPPEE 
AKCLRYYIAD RLNNDPGWKN LTVILSDASA 
HDPNTHHCLC GADADLIMLG LATHEPNFTI 
HEVKDCEGLP REKKGKHDEL ADSLPCAEGE 
SLPFTFDVER SIDDWVFMCF FVGNDFLPHL 
VHKTGGYLTE SGYVNLQRVQ MIMLAVGEVE 
KRKRMKRDQP AFTPSGILTP HALGSRNSPG 
SPSISPNTSF TSDGSPSPLG GIKRKAEDSD 
YKNKFDVDAA DEKFRRKVVQ SYVEGLCWVL 
FASDFEGIAD MPSDFEKGTK PFKPLEQLMG 
PDSSIIDFYP EDFAIDLNGK KYAWQGVALL 
PEETRRNSLG GDVLFVGKHH PLHDFILELY 
KFSLDEEAIL PDQIVCSPVP MLRDLTQNTV 
LPGARKPAAV LKPSDWEKSS NGRQWKPQLG 
VMPRGSGTGI YSNAAPPPVT YQGNLYRPLL 
GPPPLFQQQR FDRGVGAEPL LPWNRMLQTQ 
RRODRGGRQG YPREGRKYPL PPPSGRYNWN 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_2ml8, frame 3 

PIR: 149635 mouse Dhml protein - mouse, N = 1, Score = 4765, P = 0 

PIR:S43891 dhpl protein - fission yeast (Schizosaccharorayces pombe), N 
= 3, Score « 1172, P - 2e-197 

PIR:S20126 exoribonuclease RATI {EC 3.1.11.-) - yeast (Saccharomyces 
cerevisiae), N * 2, Score » 1146, P - 3.8e-175 

PIR:S72531 exonuclease II - fission yeast (Schizosaccharomyces pombe), 
N = 4, Score - 622, P - 4.2e-125 



>PIR: 149635 mouse Dhml protein 
Length = 947 

HSPs: 
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Score - 4765 (714.9 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 884/930 (95%), Positives = 895/930 (96%) 



Query: 1 MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 

MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 
Sbjct: 1 MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 

Query: 61 PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

PCTHPEDKPAPKNEDEMMVAIFEYIDRLF+IVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 
Sbjct: 61 PCTHPEDKPAPKNEDEMMVAIFEYIDRLFNIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

Query: 121 ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

A K GMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
Sbjct: 121 AI KGGM EAA VE KQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFM DNL AKC LR Y Y I AD 180 

Query: 181 RLNNDPGWKNLTVILSDASAPGBGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 240 

RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPN DPNTHHCLCGADADLIMLG 
Sbjct: 181 RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNQDPNTHHCLCGADADLIMLG 240 

Query: 241 LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

LATHEPNFTI IREEFKPNKPKPC LCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
Sbjct: 241 LATHEPNFTI I REEFKPNKPKPCALCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

Query: 301 FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 360 

FIFLRLNVLREYLERELTMASLPF FDVERS DDW FMCFFVGNDFLPHLPSLEIRE AI 
Sbjct: 301 FIFLRLNVLREYLERELTMASLPFPFDVERSNDDWEFMCFFVGNDFLPHLPSLEIREGAI 360 

Query: 361 DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 420 

DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 
Sbjct: 361 DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 420 

Query: 421 KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 480 

KRKRMKRDQPAFTPSGILTPHALGSRNSPG QVASNPRQAAYEMRMQ NSSPSISPNTSF 
Sbjct: 421 KRKRMKRDQPAFTPSGILTPHALGSRNSPGCQVASNPRQAAYEMRMQRNSSPSISPNTSF 480 

Query: 481 TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 540 

SDGSPSPLGGI+RKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 
Sbjct: 481 ASDGSPSPLGGIRRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 540 

Query: 541 SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 600 

SYVEGLCWVLRYYYQGCASWKW YPFHYAPFASDFEGIADM S+FEKGTKPFKPLEQLMG 
Sbjct: 541 SYVEGLCWVLRYYYQGCASWKWLYPFHYAPFASDFEGIADMSSEFEKGTKPFKPLEQLMG 600 

Query: 601 VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 

VFPAASGN FLP P+WRKLMS DP DS S 1 1 DFYPEDFA I DLNGKKYAWQGVALL P FVDERRLRA 
Sbjct: 601 VFPAASGN FLPPTWRKLMSDPDSS I IDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 

Query: 661 ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 720 

ALEEVYPDLTPEE RRNSLGGDVLFVGK HPL DFILELYQTGSTEPV+VPPELCHGIQG 
Sbjct: 661 ALEEVYPDLTPEENRRNSLGGDVLFVGKLHPLRDFILELYQTGSTEPVDVPPELCHGIQG 720 

Query: 721 KFS L DEE A I L P DQ I VC S P V PML RD LTQNT V VS I N FKD PQ FAE D Y I FKA VML PGARK P AA V 780 

FSLDEEAILPDQ VCSPVPMLRDLTQNT VS I N FKDPQFAED Y + FKA ML PGARK PA V 
Sbjct: 721 T FSLDE E A I L P DQT VC S P V PMLRD LTQNT AV S I N FKD PQ FAEDYV FKAAML PGARK PAT V 780 

Query: 781 LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 840 

LKP DWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHV PRGSGT +Y+N A P 
Sbjct: 781 LKPGDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVTPRGSGTSVYTNTALLPAN 840 

Query: 841 YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 900 

YQGN YRPLLRGQAQIPKLMSNMRP+DSWRGPPPLFQQ RF+R VGAEPLLPWNRM+Q Q 
Sbjct: 841 YQGNNYRPLLRGQAQIPKLMSNMRPKDSWRGPPPLFQQHRFERSVGAEPLLPWNRMIQNQ 900 



Query: 901 NAAFQPNQYQMLAGPGGYPPRRDD-RGGRQ 929 

NAAFQPNQYQML GPGGYPPRRDD RGGRQ 
Sbjct: 901 NAAFQPNQYQMLGGPGGYPPRRDDHRGGRQ 930 



Pedant information for DKFZphtes3_2ml8, frame 3 



Report for DKFZphtes3_2ml8 .3 



[LENGTH] 950 

[MW] 108582.68 

tpl] 7.26 

[HOMOL] PIR: 14 9635 mouse Dhml protein - mouse 0.0 

f FUNCAT ] 08.01 nuclear transport (S. cerevisiae, YOR04Bc] le-123 

[FUNCATJ 04.01.04 rrna processing [S. cerevisiae, YOR048c] le-123 
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[FUNCAT) 30.10 nuclear organization [S. cerevisiae, YOR048c] le-123 

[FUNCAT] 01.03.16 polynucleotide degradation {S. cerevisiae, YGL173c] 3e-79 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YGLl73c] 3e-79 

[ FUNCAT J 03.22 cell cycle control and mitosis [S. cerevisiae, YGLl73c] 3e-79 

(PIRKW] nucleus le-126 

[PIRKWJ hydrolase le-122 

[ PIRKW} exoribonuclease le-122 

[PROSITE] MYRISTYL 7 

[PROSITE] AMI DAT I ON 2 

(PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE J CK2_PHOSPHO_SITE 12 

[PROSITE J TYR_PHOSPHO_SITE 1 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN~*GLYCOSYLATION 4 

[KW] TRANSMEMBRANE 1 

[KW] LOW COMPLEXITY 6.21 % 



SEQ MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 

SEG 

PRD cccchhhhhhhhhcceeeeeeecccccccccccccccccccccccccccceeeeccceee 

MEM 



SEQ PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhcceeeeeeeccccchhhhhhhhhhhhh 

MEM 



SEQ ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 



SEQ RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 

SEG 

PRD hcccccccceeeeeeeccccccccchhhhhhhhhhhhccccccccccccccccccceeec 

MEM 



SEQ LATHEPNFTI IREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

SEG 

PRD ccccccccccccccccccccccceeeccccccccccccccchhhhhhhhhcccccccccc 

MEM 



SEQ FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhchhhhhhhhhhhheeeeeeccccccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ DRLVNI YKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 

SEG xxxxxx 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 

SEG xxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccchhhhhhhhhhhhhhhccccccccccccc 

MEM 



SEQ TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 

SEG xx xxxxxxxxxxx 

PRD ccccccccchhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 



SEQ SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 

SEG 

PRD hhhhhhheeeeeeccccccccccccccccccccccccccccccccccccccccchhhhhh 

MEM 



SEQ VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

SEG 

PRD hccccccccccccccccccccccceeeccccceeeccccccceeeeeeeeeccchhhhhh 

MEM 



SEQ ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 

SEG 

PRD hhhhhccccchhhhhhcccccceeeeeecccchhhhhhhhhcccccceeecccccccccc 

MEM 

SEQ KFSLDEEAILPDQIVCSPVPMLRDLTQNTWSINFKDPQFAEDYIFKAVMLPGARKPAAV 

SEG 
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PRO cccccceeecccceeeccccccccccccceeeeecccccchhhhheeeccccccccccee 

MEM 

SEQ LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 

SEG 

PRD eccccccccccccccccccccccccccccchhhhhhhhhhcccccccccccccccccccc 

MEM 

SEQ YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 

SEG 

PRD cccccchhhhhcccchhhhhcccccccccccccccchhhhhccccccccccccchhhhhh 

MEM 

SEQ NAAFQPNQYQMLAGPGGYPPRRDDRGGRQGYPREGRKYPLPPPSGRYNWN 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccccceeecccccccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_2ml8 . 3 



pcnnnn i 


190- 


•>194 


A.*?N GT.YrO^YT.ATTON 




rouuuu i 


247- 


>251 








468- 


->472 


a cm n.r wrr\<zvi attain 

rtoW vjLi X X Jjt\ I X Un 




rOUUUw X 


477->481 


A^M GT^YCO^Y'LATION 


PDoeooooi 


PS00002 


826- 


>830 


GL YCOS AMI NOGLYCAN 


PDOC00002 


PS00004 


675- 


•>679 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


11->14 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116- 


>119 


PKC PHOSPHO SITE 


PDOC0000S 


PS00005 


413- 


>416 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


559- 


>562 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


613- 


•>616 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


674- 


•>677 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


868- 


>871 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


944- 


■>947 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


63->67 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


331- 


•>335 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


499- 


■>503 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


501- 


■>505 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


541- 


•>545 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


573- 


■>577 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


583- 


■>587 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


619- 


>623 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


624- 


•>628 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


670- 


■>674 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


723->727 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


784- 


>788 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


659->667 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


125->131 


MYRISTYL 


PDOC00008 


PS00008 


375->381 


MYRISTYL 


PDOC00008 


PS00008 


450- 


■>456 


MYRISTYL 


PDOC00008 


PS00008 


600- 


■>606 


MYRISTYL 


PDOC00008 


PS00008 


825- 


•>831 


MYRISTYL 


PDOC00008 


PS00008 


829- 


->835 


MYRISTYL 


PDOC00008 


PS00008 


926- 


->932 


MYRISTYL 


PDOC00008 


PS00009 


638- 


•>642 


AMI DAT ION 


PDOC00009 


PS00009 


934- 


*>938 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphtes3_2ml8.3) 
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DKFZphtes3_2m20 



group: testes derived 

DKFZphtes3_2m20 encodes a novel 183 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



group: unknown 

DKFZphtes3_2m20 encodes a novel 

amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

EST hits are only from testis or uterus librarys 
remaining intron in3' UTR see EST-BLAST 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1341 bp 

Poly A stretch at pos. 1320, polyadenylation signal at pos. 1300 



ACTGTTCGTG 
TCCAGCAGCC 
TTTGGCAAAA 
CCTGTGTGAG 
AAATGAGAGG 
CTGAGCTGGC 
GGTGACGAGT 
CTGAGGCTCT 
GCGGAGGAGC 
CGCTGTCTGG 
GTGTTGGGTG 
ATTTTGGACA 
AGCGACGACT 
GAAAGACATT 
AAATCACTGG 
GCTGTTGTCC 
CTAGGCCTGG 
GGGCTTCCCT 
CCTGCCCTGT 
AAAGGCCAGA 
TTCCCTCTCA 
CCAACTTGGC 
GAAAGCCTGA 
CCAGAGCATG 
CTCTGCTGGT 
CCAATGAGCA 
A 



1 GCAATCCAGG AGCTGAATGG 
51 AATACAAGCA AAAGGCCCCC 
101 TCGGGCCCCG CTGGTGTTGG 
151 CCGCCGCTAT TATACAGCTC 
201 ACAGAGAAAC TGAGGGCAGC 
251 GTCCTTCCTG GTGCTGCTCC 
301 TGGCACTCAT CCATAGCGTC 
351 ATCGTCCCGA AGACCCCGTT 
401 GCACCAGAAC ATGCAGGCTC 
451 AGCAGCCCTA CCTGGAGGCT 
501 CAGAGTACCA CCTGGGGGAT 
551 CTGGACAGGG TGGACACCTG 
601 GTTGGCCACC ATCCCTGTGC 
651 TCTGGACCAT CCCACCCCTG 
701 TTGAGTTCGT ATGAGGTTGT 
751 TGCTTTGAAC TCGGCGGTAA 
801 CTCCACTCCT GCCCTTGGGG 
851 ACACATTGCA CATCCTAAAG 
901 CAGCATGTTC CCTCTCCTGT 
951 CTTCTCGTAC CCCTTTCACT 
1001 CTGTGCCCAG GATTGATTCA 
1051 AAAGAGAGTG AAGTCTCATT 
1101 ATGAACATTT GAACCAAACA 
1151 GGCAGCTGGG ATGGTCTTTC 
1201 ATATAAGTGG TCCTAACAGA 
1251 TTTCCTGGCA TTCCATGTAG 
1301 AATAAATGTT GGCATGTTTC 



TAACTCTTCC ACAAGCGAAA 
CAAGAGGACC CCTGATATGA 
AGAAGGCTTC TGGTGAAGGA 
GCTCCTAAAG CTCCTGTTGA 
CTTCTTTGCA GTCCCGTTGG 
TGAGGGAATG CTTCCGAGAC 
CGTGGGGAGG CGGGGCTGCT 
TTTCTGGGCC ATGCACATCA 
TGTTTAGCAC CCTGGCTCAG 
CCACCGTTAT GCGCGGGACT 
TATGGACACG CCTGGAACAG 
GGCTGTGGTC ATGTTCATTG 
AGTCTCTGCG CCAGCTAGAC 
ACTCAGCCAT TCATGCTGGA 
CCATCGAATC CTCAAAGGGA 
CTGCTCCTGC ATCTAACTTG 
TGTCTGCAGC AGGCTGCTGC 
TTTGAAGAGT CTAAATAACG 
TTGCCACGGA TCCAGAGCCA 
CTTGAGGCCT GGGAGGTGAA 
ATTTTGCTTT TACTCCCAGC 
TGTCATGTGT CTTCAGTTCC 
TAGGAAACTA CCATTAGGTT 
TTGTGTCTCT TCTTTGCACC 
TTCTGGATAA TGGAGAAGCC 
AATAGGTAGA GAATATTTAA 
ATGAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 479 bp to 841 bp; peptide length: 121 
Category: questionable ORF 
Classification: no clue 

1 MRGTRCLAEY HLGDYGHAWN RCWVLDRVDT WAVVMFIDFG QLATIPVQSL 
51 RQLDSDDFWT IPPLTQPFML EKDILSSYEV VHRILKGKIT GALNSAVTAP 
101 ASNLAWPPL LPLGCLQQAA A 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 2 
No Alert BLASTP hits found 

Peptide information for frame 3 



ORF from 87 bp to 635 bp; peptide length: 183 
Category: putative protein 
Classification: no clue 

1 MIQQPRAPLV LEKASGEGFG KTAAIIQLAP KAPVDLCETE KLRAAFFAVP 
51 LEMRGSFLVL LLRECFROLS WLALIHSVRG EAGLLVTSIV PKTPFFWAMH 
101 ITEALHQNMQ ALFSTLAQAE EQQPYLEAPP LCAGLAVWQS TTWGIMDTPG 
151 TGVGCWTGWT PGLWSCSLIL DSWPPSLCSL CAS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2m20, frame 2 

Report for DKFZphtes3_2m20 . 2 



[LENGTH] 121 

[MW] 13436.69 

[pi] 5.81 

[KW] Alpha_Beta 



SEQ MRGTRCLAEYHLGDYGHAWNRCWVLDRVDTWAVVMFIDFGQLATI PVQSL RQLDSDDFWT 

PRD ccchhhhhcccccccccccceeeecccccccceeeeeecccccccccccccccccccccc 

SEQ IPPLTQPFMLEKDILSSYEWHRILKGKITGALNSAVTAPASNLAVVPPLLPLGCLQQAA 

PRD cccccchhhhhhhcchhhhhhhhhhcccccchhhhhhcccccceeeeccccccccccccc 

SEQ A 

PRD c 



(No Prosite data available for DKFZphtes3_2m20.2> 
(No Pfam data available for DKFZphtes3_2m20.2) 

Pedant information for DKFZphtes3_2m20, frame 3 

Report for DKFZphtes3_2m20 . 3 



[LENGTH) 183 

[MW] 19971.49 

[pi] 5.31 

[KW] Alpha_Beta 
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SEQ MIQQPRAPLVLEKASGEGFGKTAAIIQLAPKAPVDLCE^EKLRAAFFAVPLEMRGSFLVL 

PRD ccccccccceeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhcchhhhhh 

SEQ LLRECFRDLSWLALIHSVRGEAGLLVTSIVPKTPFFWAMHITEALHQNMQALFSTLAQAE 

PRD hhhhhhcchhhhhhhhhhcccceeeeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EQQPYLEAPPLCAGLAVWQSTTWGIMDTPGTGVGCWTGWTPGLWSCSLILDSWPPSLCSL 

PRD hhhcccccccccccceeeecccceeecccccccccccccccccccceeeeccccccceee 

SEQ CAS 

PRD CCC 



(No Prosite data available for DKFZphtes3_2m20 . 3) 
(No Pfam data available for DKFZphtes3_2m20 . 3) 
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DKFZphtes3_2n9 



group: testes derived 

DKFZphtes3_2n9 encodes a novel 184 amino acid protein with very weak similarity to Homo 
sapiens PAC clone DJ0771P04 from 7qll . 21-qll . 23 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

on genomic level encoded by HS1186N24, no splice pattern but EST 
matches 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1000 bp 

Poly A stretch at pos. 988, polyadenylation signal at pos. 970 

1 CAACTTTTTA AAGATGTGAA TTGGACAGCC AGACTTGCTT ATTTGTCTGA 
51 TATCTTCAGT ATTTTTTAAT GATCTTAATG CTTCTATGCA AGGGAAGAAT 
101 GCAACTTATT TTTCAATGGC AGATAAAGTT GAAGGACAAA AACAGAAGTT 
151 AGAAGCTTGG AAAAACAGAA TTTCTACAGA TTGTTATGAC ATGTTTCATA 
201 ATTTAACAAC AATTATCAAT GAAGTAGGTA ATGATCTTGA TATTGCACAT 
251 CTGCGAAAAG TTATCAGTGA ACATCTTACA AATTTGTTAG AATGTTTTGA 
301 ATTTTATTTT CCATCAAAAG AAGATCCACG CATAGGAAAT TTGTGGATCC 
351 AAAATCCATT TCTTTCATCA AAAGATAACT TAAATTTAAC TGTAACTCTA 
401 CAGGATAAGT TGTTGAAGCT GGCTACCGAC GAAGGATTGA AAATCAGTTT 
451 TGAAAATACA GCATCACTTC CTTCATTTTG GATAAAAGCT AAAAATGACT 
501 ATCCTGAGCT TGCTGAGATT GCTTTAAAAT TGCTGCTTCT TTTCCCCTCA 
551 ACATACCTCT GTGAGACCGG ATTCTCTACT TTAAGTGTTA TTAAAACAAA 
601 ACATAGAAAC AGTTTAAATA TACATTATCC CCTGAGGTAG CATTGTCATC 
651 AATCCAACCT AGATTAGACA AATTAACAAG CAAGAAGCAA GCTCACTTAT 
701 CACATTAAAA GCTTTAAATA TTGATATGTA AGGTATTGGT TCAAAGTATG 
751 CATATAAGCA TTGAGTGTGA GGAATTTGCT ATTTCACTTT AAACTTTCTG 
801 TCTAGTTACA GTTATGGAAG TATGAGAAGT TATGAGTGAA ACAGCAATTT 
851 TC TAT AT AAA TTGCCTATAT GTATATTTTC AATTAAGAAT GTGTACAGTT 
901 TTTATAATTC TATTTTTCCT CATATTTGTC GTATTTATTA AAATATAATT 
951 TTAAATCTGT TGATTCTAAT ATTAAAACAT TTGATCTTAA AAAAAAAAAA 



BLAST Results 



Entry HS1186N24 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 1186N24 
Score - 4921, P - 5.8e-215, identities - 989/992 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 86 bp to 637 bp; peptide length: 184 
Category: similarity to unknown protein 
Classification: no clue 

1 MQGKNATYFS MADKVEGQKQ KLEAWKNRIS TDCYDMFHNL TTIINEVGND 
51 LDIAHLRKVI SEHLTNLLEC FEFYFPSKED PRIGNLWIQN PFLSSKDNLN 
101 LTVTLQDKLL KLATDEGLKI SFENTASLPS FWIKAKNDYP ELAEIALKLL 
151 LLFPSTYLCE TGFSTLSVIK TKHRNSLNIH YPLR 



B LAS TP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2n9, frame 2 

TREMBLNEW:AC004883_3 gene: "WUGSC :H_DJ0771P04 .2"; Homo sapiens PAC 
clone DJ0771P04 from 7qll . 21-qll . 23, complete sequence., N ■ 1, Score * 
94, P » 0.042 



>TREMBLNEW:AC0048B3_3 gene: "WUGSC : H_DJ0771P04 . 2"; Homo sapiens PAC clone 
DJ0771P04 from 7qll . 21-qll . 23, complete sequence. 
Length = 533 

HSPs: 

Score - 94 (14.1 bits), Expect - 4.3e-02, P = 4.2e-02 
Identities - 39/177 (22%), Positives = 75/177 (42%) 



Query: 1 MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLD-IAHLRKV 59 

+QG + M D + KL W+ ++ + F L + L+ I + ++ 

Sbjct: 354 LQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAEL 413 

Query: 60 ISEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLK 119 

+E L + F+ Y + + + +PF + D+++ LQ +++ L + LK 

Sbjct: 414 KTEFQKRLSD-FKLY ESELTL FSSPFSTKIDSVH--EELQMEVIDLQCNTVLK 4,63 

Query: 120 ISFENTASLPSFWIKAKNDYPXXXXXXXXXXXXFPSTYLCETGFSTLSVIKTKHRNSL 177 

++■ +P F+ YP F STY+CE FS + + KTK+ + L 

Sbjct: 4 64 TKYDKVG-IPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQL 520 

Pedant information for DKFZphtes3_2n9, frame 2 



Report for DKFZphtes3_2n9 . 2 

[LENGTH] 184 

[MW] 21203.53 

[pi] 6.52 

[KW] Alpha^Beta 

[KW] LOW COMPLEXITY 6.52 % 



SEQ MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLDIAHLRKVI 

SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhcccceeecccccccchhhhhhhhh 

SEQ SEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLKI 

SEG 

PRD hhhhhhhhhhhhcccccccccccceeeeccccccccccceeeeehhhhhhhhhhhcccee 

SEQ SFENTASLPSFWIKAKNDYPELAEIALKLLLLFPSTYLCETGFSTLSVIKTKHRNSLNIH 

SEG xxxxxxxxxxxx 

PRD eecccccccceeeeecccchhhhhhhhhhhhhcccccccccccceeeeeecccccceeec 

SEQ YPLR 

SEG 

PRD CCCC 



(No Prosite data available for DKFZphtes3_2n9.2) 
(No Pfam data available for DKFZphtes3_2n9.2) 
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DKFZphtes3 - 30f4 



group: testes derived 

DKFZphtes3 30f4 encodes a novel 192 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by LMU 

Locus: /map='*717.2-8 cR from top of Chr8 linkage group" 
Insert length: 1388 bp 

Poly A stretch at pos. 1330, polyadenylation signal at pos. 1310 

1 CACTGAGCCC TCCTCAGATG GTTAGTGGCT TCCAACAGCC ATCAGGAGTG 
51 TTTCTTGAAT GCCCCAGGTG TGGAGGACTT GGTCTGTGAC CACCTAGAAC 

101 CCCAGAGCTG AACAGGAAGC CGTCCCTGCA GCAACAAGAG GGCTGGAAGG 

151 GGGAGCTGCA GGCCACCCTC GGCTCTCCCA CTGCTGGGGC GGTGATGTTC 

201 GGGTGACATG TTTGAAAAAT ACTCTTAAAG ATACCAACTG TTCCCTTATA 

251 TGGCTAATGG TTTGTGCAGC CACCAGCGAT GGCGGCCCCT ATTAGAGACC 

301 AGGTTTGTTA AAACACCAAA TATTGCTGTC C AC ACT AG AC ATTAACCGGC 

351 TTCAGAAAAG ATGGACACCT TTTCCCACGC TGTTTCGCTT CTTAACTTTG 

401 GTCCAGCTTT AGCCACCACA CAGCGTGTGA GGGACTGCTG CTGCGGAGTC 

451 AGCCTCGTTT GTCCCTCCGC CTCCCACCAG CATGCGCCGC TTCTGAGAGA 

501 CACCAGCTCC CTGCCTCCAA GCCTGGTGCC ACAGGCCTGT CGTGAGGGAC 

551 CCCTGCTTCC GAGAGCTCCT GGGGGGGTTC TGCCCTTCAC CACCTGGGAG 

601 AGGTGTCAGT TCAGTTCCGA GTTGAACAAG GCCCGTGCAC ACAGCATGTT 

651 GGGGGCCCAG CCCAAAGTTC TTGTCACCTC CTCATGCAAA GCCAGCCATC 

701 ACCCTCCGGC CAGAGCTCAA GGTGGCCCCT TGGCCAGCCC CTCCTTGGGT 

751 CCTCCAGGAG GACTGAGCAC CCCTCCTAGC GGCATCCCTT GCCCTCCACA 

801 GTGCTGCCAG GGGCACGTCG CTCTGTGCCG TGGACTGAGA CCATCCCCTG 

851 GTGACAGAAT GACCCGTTTG TTGGAAATGC CTCGTTGCCA GAGAAACTCC 

901 CCAGGCATCT CGGAACGAAA CTATTTAGTT CCATTGTGAA CTGGCCACGG 

951 GACAGCTTTT TATCAACTTA TTAAGTTGGA GCACTGTAAT CGCGCTTGCT 
1001 GAGTTAGCAG TGGTGGTAAG CGTGTGTTAA ACACATAATG TTACGTTTTA 
1051 GGAGAGAGAG GTCGTAAGGA AGTGTCGTGT CGCTCATGAC TCTCTTCTAT 
1101 TAGTTGGGTA ACAGTGGCCT CATGTTTGTG TCTGTGTGTA CACAGAGCCC 
1151- TTAGGTTCTG CTCTGTTTCT TTGCCAGGTG AATGTTTGTG GCATGCGCTG 
1201 CTGTCCGCGC CCCTCTGTCC TGCGCAGGGT TCAGCTGTGC GGCGCCCTGA 
1251 TTTCCTCCAT GCACACAGAA CCTCCTTGTG TCTGTTTCTC TGTTCCTCTG 
1301 TGGCTGACTC AATAAACTTT TCCCTCTGAC ATGAAAAAAA AAAAAAAAAA 
1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



Entry HS548358 from database EMBL: 
human STS EST67250. 
Score = 2126, P = 1.5e-89, identities = 444/472 

Entry HS670351 from database EMBL: 
human STS WI-18501. 
Score = 2089, P = 7.1e-88, identities = 445/476 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 361 bp to 936 bp; peptide length: 192 
Category: putative protein 
Classification: no clue 
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WO 01/12659 



PCT7IB00/01496 



1 MDTFSHAVSL LNFGPALATT QRVRDCCCGV SLVCPSASHQ HAPLLRDTSS 

51 LPPSLVPQAC REGPLLPRAP GGVLPFTTWE RCQFSSELNK ARAHSMLGAQ 

101 PKVLVTSSCK ASHHPPARAQ GGPLASPSLG PPGGLSTPPS GIPCPPQCCQ 

151 GHVALCRGLR PSPGDRMTRL LEMPRCQRNS PGISERNYLV PL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30f 4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_30f 4, frame 1 

Report for DKFZphtes3_30f 4 . 1 

[LENGTH] 192 

[MW] 20281.56 

[pi] 9.21 

[BLOCKS] BL01013C Oxysterol-binding protein family proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 10.94 % 



SEQ MDTFSHAVSLLNFGPALATTQRVRDCCCGVSLVCPSASHQHAPLLRDTSSLPPSLVPQAC 

SEG 

PRD ccchhhhheeecccccchhhhhhhhcccceeeeccccccccccccccccccccccccccc 

SEQ REGPLLPRAPGGVLPFTTWERCQFSSELNKARAHSMLGAQPKVLVTSSCKASHHPPARAQ 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhccccceeeeecccccccccccccc 

SEQ GGPLASPSLGPPGGLSTPPSGIPCPPQCCQGHVALCRGLRPSPGDRMTRLLEMPRCQRNS 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccchhhhhhhhcccccccchhhhhccccccccc 

SEQ PGISERNYLVPL 

SEG 

PRD cccccccccccc 



(No Prosite data available for DKFZphtes3_30f4 . 1) 
(No Pfam data available for DKFZphtes3_30f 4 . 1 ) 
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DKFZphtes3_35b4 



group: cell cycle 

DKFZphtes3_35b4 encodes a novel 1780 amino acid protein which is C-terminal identical to human 
M-phase phosphoprotein-1 (MPP1). 

The novel protein contains a N-terminal Pfara kinesin motor domain and a ATP/GTP-binding site 
motif A (P-loop) . MPP1 is expressed and phosphorylated in the metaphase. Therefore the novel 
protein seems to be involved in the mitotic spindle during cell division. 

The new protein can find application in modulation of the mitotic spindle. 



"M-phase phosphoprotein-1" extension 
motor protein 
Sequenced by DKFZ 

Locus: /map="750_H_l; 758_H_7; 759_C_9; 847_D_4; 906_D_1; 931_D_3; 944_C_1; 750_G_12; 
B00_A_11; 512.1 cR from top of ChrlO linkage group" 

Insert length: 6284 bp 

No poly A stretch found, no polyadenylation signal found 



1 ATCGCAGTGC TGCTCGCGGG TCTGGCTAGT CAGGCGAAGT TTGCAGAATG 
51 GAATCTAATT TTAATCAAGA GGGAGTACCT CGACCATCTT ATGTTTTTAG 
101 TGCTGACCCA ATTGCAAGGC CTTCAGAAAT AAATTTCGAT GGCATTAAGC 
151 TTGATCTGTC TCATGAATTT TCCTTAGTTG CTCCAAATAC TGAGGCAAAC 
201 AGTTTCGAAT CTAAAGATTA TCTCCAGGTT TGTCTTCGAA TAAGACCATT 
251 TACACAGTCA GAAAAAGAAC TTGAGTCTGA GGGCTGTGTG CATATTCTGG 
301 ATTCACAGAC TGTTGTGCTG AAAGAGCCTC AATGCATCCT TGGTCGGTTA 
351 AGTGAAAAAA GCTCAGGGCA GATGGCACAG AAATTCAGTT TTTCCAAGGT 
401 TTTTGGCCCA GCAACTACAC AGAAGGAATT CTTTCAGGGT TGCATTATGC 
451 AACCAGTAAA AGACCTCTTG AAAGGACAGA GTCGTCTGAT TTTTACTTAC 
501 GGGCTAACCA ATTCAGGAAA AACATATACA TTTCAAGGGA CAGAAGAAAA 
551 TATTGGCATT CTGCCTCGAA CTTTGAATGT ATTATTTGAT AGTCTTCAAG 
601 AAAGACTGTA TACAAAGATG AACCTTAAAC CACATAGATC CAGAGAATAC 
651 TTAAGGTTAT CATCAGAACA AGAGAAAGAA GAAATTGCTA GCAAAAGTGC 
701 ATTGCTTCGG CAAATTAAAG AGGTTACTGT GCATAATGAT AGTGATGATA 
751 CTCTTTATGG AAGTTTAACT AACTCTTTGA ATATCTCAGA GTTTGAAGAA 
801 TCCATAAAAG ATTATGAACA AGCCAACTTG AATATGGCTA ATAGTATAAA 
851 ATTTTCTGTG TGGGTTTCTT TCTTTGAAAT TTACAATGAA TATATTTATG 
901 ACTTATTTGT TCCTGTATCA TCTAAATTCC AAAAGAGAAA GATGCTGCGC 
951 CTTTCCCAAG ACGTAAAGGG CTATTCTTTT ATAAAAGATC TACAATGGAT 
1001 TCAAGTATCT GATTCCAAAG AAGCCTATAG ACTTTTAAAA CTAGGAATAA 
1051 AGCACCAGAG TGTTGCCTTC ACAAAATTGA ATAATGCTTC CAGTAGAAGT 
1101 CACAGCATAT TCACTGTTAA AATATTACAG ATTGAAGATT CTGAAATGTC 
1151 TCGTGTAATT CGAGTCAGTG AATTATCTTT ATGTGATCTT GCTGGTTCAG 
1201 AACGAACTAT GAAGACACAG AATGAAGGTG AAAGGTTAAG AGAGACTGGG 
1251 AATATCAACA CTTCTTTATT GACTCTGGGA AAGTGTATTA ACGTCTTGAA 
1301 GAATAGTGAA AAGTCAAAGT TTCAACAGCA TGTGCCTTTC CGGGAAAGTA 
1351 AACTGACTCA CTATTTTCAA AGTTTTTTTA ATGGTAAAGG GAAAATTTGT 
1401 ATGATTGTCA ATATCAGCCA ATGTTATTTA GCCTATGATG AAACACTCAA 
1451 TGTATTGAAG TTCTCCGCCA TTGCACAAAA AGTTTGTGTC CCAGACACTT 
1501 TAAATTCCTC TCAAGATAAA TTATTTGGAC CTGTCAAATC TTCTCAAGAT 
1551 GTATCACTAG ACAGTAATTC AAACAGTAAA ATATTAAATG TAAAAAGAGC 
1601 CACCATTTCA TGGGAAAATA GTCTAGAAGA TTTGATGGAA GACGAGGATT 
1651 TGGTTGAGGA GCTAGAAAAC GCTGAAGAAA CTCAAAATGT GGAAACTAAA 
1701 CTTCTTGATG AAGATCTAGA TAAAACATTA GAGGAAAATA AGGCTTTCAT 
1751 TAGCCACGAG GAGAAAAGAA AACTGTTGGA CTTAATAGAA GACTTGAAAA 
1801 AAAAACTGAT AAATGAAAAA -AAGGAAAAAT TAACCTTGGA ATTTAAAATT 
1851 CGAGAAGAAG TTACACAGGA GTTTACTCAG TATTGGGCTC AACGGGAAGC 
1901 TGACTTTAAG GAGACTCTGC TTCAAGAACG AGAGATATTA GAAGAAAATG 
1951 CTGAACGTCG TTTGGCTATC TTCAAGGATT TGGTTGGTAA ATGTGACACT 
2001 CGAGAAGAAG CAGCGAAAGA CATTTGTGCC ACAAAAGTTG AAACTGAAGA 
2051 AGCTACTGCT TGTTTAGAAC TAAAGTTTAA TCAAATTAAA GCTGAATTAG 
2101 CTAAAACCAA AGGAGAATTA ATCAAAACCA AAGAAGAGTT AAAAAAGAGA 
2151 GAAAATGAAT CAGATTCATT GATTCAAGAG CTTGAGACAT CTAATAAGAA 
2201 AATAATTACA CAGAATCAAA GAATTAAAGA ATTGATAAAT ATAATTGATC 
2251 AAAAAGAAGA TACTATCAAC GAATTTCAGA ACCTAAAGTC TCATATGGAA 
2301 AACACATTTA AATGCAATGA CAAGGCTGAT ACATCTTCTT TAATAATAAA 
2351 CAATAAATTG ATTTGTAATG AAACAGTTGA AGTACCTAAG GACAGCAAAT 
2401 CTAAAATCTG TTCAGAAAGA AAAAGAGTAA ATGAAAATGA ACTTCAGCAA 
2451 GATGAACCAC CAGCAAAGAA AGGGTCTATC CATGTTAGTT CAGCTATCAC 
2501 TGAAGACCAA AAGAAAAGTG AAGAAGTGCG ACCGAACATT GCAGAAATTG 
2551 AAGACATCAG AGTTTTACAA GAAAATAATG AAGGACTGAG AGCATTTTTA 
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2601 CTCACTATTG AGAATGAACT TAAAAATGAA AAGGAAGAAA AAGCAGAATT 
2651 AAATAAACAG ATTGTTCATT TTCAGCAGGA ACTTTCTCTT TCTGAAAAAA 
2701 AGAATTTAAC TTTAAGTAAA GAGGTCCAAC AAATTCAGTC AAATTATGAT 
2751 ATTGCAATTG CTGAATTACA TGTGCAGAAA AGTAAAAATC AAGAACAGGA 
2801 GGAAAAGATC ATGAAATTGT CAAATGAGAT AGAAACTGCT ACAAGAAGCA 
2851 TTACAAATAA TGTTTCACAA ATAAAATTAA TGCACACGAA AATAGACGAA 
2901 CTACGTACTC TTGATTCAGT TTCTCAGATT TCAAACATAG ATTTGCTCAA 
2951 TCTCAGGGAT CTGTCAAATG GTTCTGAGGA GGATAATTTG CCAAATACAC 
3001 AGTTAGACCT TTTAGGTAAT GATTATTTGG TAAGTAAGCA AGTTAAAGAA 
3051 TATCGAATTC AAGAACCCAA TAGGGAAAAT TCTTTCCACT CTAGTATTGA 
3101 AGCTATTTGG GAAGAATGTA AAGAGATTGT GAAGGCCTCT TCCAAAAAAA 
3151 GTCATCAGAT TGAGGAACTG GAACAACAAA TTGAAAAATT GCAGGCAGAA 
3201 GTAAAAGGCT ATAAGGATGA AAACAATAGA CTAAAGGAGA AGGAGCATAA 
3251 AAACCAAGAT GACCTACTAA AAGAAAAAGA AACTCTTATA CAGCAGCTGA 
3301 AAGAAGAATT GCAAGAAAAA AATGTTACTC TTGATGTTCA AATACAGCAT 
3351 GTAGTTGAAG GAAAGAGAGC GCTTTCAGAA CTTACACAAG GTGTTACTTG 
3401 CTATAAGGCA AAAATAAAGG AACTTGAAAC AATTTTAGAG ACTCAGAAAG 
3451 TTGAACGTAG TCATTCAGCC AAGTTAGAAC AAGACATTTT GGAAAAGGAA 
3501 TCTATCATCT TAAAGCTAGA AAGAAATTTG AAGGAATTTC AAGAACATCT 
3551 TCAGGATTCT GTCAAAAACA CCAAAGATTT AAATGTAAAG GAACTCAAGC 
3601 TGAAAGAAGA AATCACACAG TTAACAAATA ATTTGCAAGA TATGAAACAT 
3651 TTACTTCAAT TAAAAGAAGA AGAAGAAGAA ACCAACAGGC AAGAAACAGA 
3701 AAAATTGAAA GAGGAACTCT CTGCAAGCTC TGCTCGTACC CAGAATCTGA 
3751 AAGCAGATCT TCAGAGGAAG GAAGAAGATT ATGCTGACCT GAAAGAGAAA 
3801 CTGACTGATG CCAAAAAGCA GATTAAGCAA GTACAGAAAG AGGTATCTGT 
3851 AATGCGTGAT GAGGATAAAT TACTGAGGAT TAAAATTAAT GAACTGGAGA 
3901 AAAAGAAAAA CCAGTGTTCT CAGGAATTAG ATATGAAGCA GCGAACCATT 
3951 CAGCAACTCA AGGAGCAGTT AAATAATCAG AAAGTGGAAG AAGCTATACA 
4001 ACAGTATGAG AGAGCATGCA AAGATCTAAA TGTTAAAGAG AAAATAATTG 
4051 AAGACATGCG AATGACACTA GAAGAACAGG AACAAACTCA GGTAGAACAG 
4101 GATCAAGTGC TTGAGGCTAA ATTAGAGGAA GTTGAAAGGC TGGCCACAGA 
4151 ATTGGAAAAA TGGAAGGAAA AATGCAATGA TTTGGAAACC AAAAACAATC 
4201 AAAGGTCAAA TAAAGAACAT GAGAACAACA CAGATGTGCT TGGAAAGCTC 
4251 ACTAATCTTC AAGATGAGTT ACAGGAGTCT GAACAGAAAT ATAATGCTGA 
4301 TAGAAAGAAA TGGTTAGAAG AAAAAATGAT GCTTATCACT CAAGCGAAAG 
4351 AAGCAGAGAA TATACGAAAT AAAGAGATGA AAAAATATGC TGAGGACAGG 
44 01 GAGCGTTTTT TTAAGCAACA GAATGAAATG GAAATACTGA CAGCCCAGCT 
4451 GACAGAGAAA GATAGTGACC TTCAAAAGTG GCGAGAAGAA CGAGATCAAC 
4501 TGGTTGCAGC TTTAGAAATA CAGCTAAAAG CACTGATATC CAGTAATGTA 
4551 CAGAAAGATA ATGAAATTGA ACAACTAAAA AGGATCATAT CAGAGACTTC 
4 601 TAAAATAGAA AC AC AAATC A TGGATATCAA GCCCAAACGT ATTAGTTCAG 
4 651 CAGATCCTGA CAAACTTCAA ACTGAACCTC TATCGACAAG TTTTGAAATT 
4701 TCCAGAAATA AAATAGAGGA TGGATCTGTA GTCCTTGACT CTTGTGAAGT 
4751 GTCAACAGAA AATGATCAAA GCACTCGATT TCCAAAACCT GAGTTAGAGA 
4801 TTCAATTTAC ACCTTTACAG CCAAACAAAA TGGCAGTGAA ACACCCTGGT 
4851 TGTACCACAC CAGTGACAGT TGAGATTCCC AAGGCTCGGA AGAGGAAGAG 
4901 TAATGAAATG GAGGAGGACT TGGTGAAATG TGAAAATAAG AAGAATGCTA 
4951 CACCCAGAAC TAATTTGAAA TTTCCTATTT CAGATGATAG AAATTCTTCT 
5001 GTCAAAAAGG AACAAAAGGT TGCCATACGT CCATCATCTA AGAAAACATA 
5051 TTCTTTACGG AGTCAGGCAT CCATAATTGG TGTAAACCTG GCCACTAAGA 
5101 AAAAAGAAGG AACACTACAG AAATTTGGAG ACTTCTTACA ACATTCTCCC 
5151 TCAATTCTTC AATCAAAAGC AAAGAAGATA ATTGAAACAA TGAGCTCTTC 
5201 AAAGCTCTCA AATGTAGAAG CAAGTAAAGA AAATGTGTCT CAACCAAAAC 
5251 GAGCCAAACG GAAATTATAC ACAAGTGAAA TTTCATCTCC TATTGATATA 
5301 TCAGGCCAAG TGATTTTAAT GGACCAGAAA ATGAAGGAGA GTGATCACCA 
5351 GATTATCAAA CGACGACTTC GAACAAAAAC AGCCAAATAA ATCACTTATG 
5401 GAAATGTTTA ATATAAATTT TATAGTCATA GTCATTGGAA CTTGCATCCT 
5451 GTATTGTAAA TATAAATGTA TATATTATGC ATTAAATCAC TCTGCATATA 
5501 GATTGCTGTT TTATACATAG TATAATTTTA ATTCAATAAA TGAGTCAAAA 
5551 TTTGTATATT TTTATAAGGC TTTTTTATAA TAGCTTCTTT CAAACTGTAT 
5601 TTCCCTATTA TCTCAGACAT TGGATCAGTG AAGATCCTAG GAAAGAGGCT 
5651 GTTATTCTCA TTTATTTTGC TATACAGGAT GTAATAGGTC AGGTATTTGG 
5701 TTTACTTATA TTTAACAATG TCTTATGAAT TTTTTTTACT TTATCTGTTA 
5751 TACAACTGAT TTTACATATC TGTTTGGATT ATAGCTAGGA TTTGGAGAAT 
5801 AAGTGTGTAC AGATCACAAA ACATGTATAT ACATTATTTA GAAAAGATCT 
5851 CAAGTCTTTA ATTAGAATGT CTCACTTATT TTGTAAACAT TTTGTGGGTA 
5901 CATAGTACAT GTATATATTT ACGGGGTATG TGAGATGTTT TGACACAGGC 
5951 ATGCAATGTG AAATACGTGT ATCATGGAGA ATGAGGTATC CATCCCCTCA 
6001 AGCATTTTTC CTTTGAATTA CAGATAATCC AATTACATTC TT TAG AT CAT 
6051 TTAAAAATAT ACAAGTAAGT TATTATTGAT TATAGTCACT CTATTGTGCT 
6101 ATCAGATAGT AGATCATTCT TTTTATCTTA TTTGTTTTTG TACCCATTAA 
6151 CCATCCCCAC CTCCCCCTGC AACCGTCAGT ACCCTTACCA GCCACTGGTA 
6201 ACCATTCTTC TACTCTGTAT GCCCATGAGG TCAATTGATT TTATTTTTAG 
6251 ATCCCATAAA TAAATGAGAA CATGCAAAAA AAAA 



BLAST Results 



Entry 
human 



HS898149 from database EMBL: 
STS WI-9217. 
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Score - 4247, P = 1.5e-187, identities » 855/862 



Medline entries 



94119956: 

Cloning of cDNAs for M-phase phosphoproteins recognized 
by the MPM2 monoclonal antibody and deternii nation of the 
phosphorylated epitope. 

98101856: 

Interaction of a Golgi-associated kinesin-like protein with 
Rab6. 

95122643: 

Identification and partial characterization of mitotic 
centromere-associated kinesin, a 

kinesin-related protein that associates with centromeres during 
mitosis . 



Peptide information for frame 3 



ORF from 48 bp to 5387 bp; peptide length: 1780 
Category: known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP A (152-160) 



1 MESNFNQEGV PRPSYVFSAD PIARPSEINF DGIKLDLSHE 
51 NSFESKDYLQ VCLRIRPFTQ SEKELESEGC VHILDSQTVV 

101 LSEKSSGQMA QKFSFSKVFG PATTQKEFFQ GCIMQPVKDL 

151 YGLTNSGKTY TFQGTEENIG ILPRTLNVLF DSLQERLYTK 

201 YLRLSSEQEK EEIASKSALL RQIKEVTVHN DSDDTLYGSL 

251 ESIKDYEQAN LNMANSIKFS VWVSFFEIYN EYIYDLFVPV 

301 RLSQDVKGYS FIKDLQWIQV SDSKEAYRLL KLGIKHQSVA 

351 SHSIFTVKIL QIEDSEMSRV IRVSELSLCD LAGSERTMKT 

401 GNINTSLLTL GKCINVLKNS EKSKFQQHVP FRESKLTHYF 

451 CMIVNISQCY LAYDETLNVL KFSAIAQKVC VPDTLNSSQD 

501 DVSLDSNSNS KILNVKRATI SWENSLEDLM EDEDLVEELE 

551 KLLDEDLDKT LEENKAFISH EEKRKLLDLI EDLKKKLINE 

601 IREEVTQEFT QYWAQREADF KETLLQEREI LEENAERRLA 

651 TREEAAKDIC ATKVETEEAT ACLELKFNQI KAELAKTKGE 

701 RENESDSLIQ ELETSNKKII TQNQRIKELI NIIDQKEDTI 

751 ENTFKCNDKA DTSSLIINNK LICNETVEVP KDSKSKICSE 

801 QDEPPAKKGS IHVSSAITED QKKSEEVRPN IAEIEDIRVL 

851 LLTIENELKN EKEEKAELNK QIVHFQQELS LSEKKNLTLS 

901 DIAIAELHVQ KSKNQEQEEK IMKLSNEIET ATRSITNNVS 

951 ELRTLDSVSQ ISNIDLLNLR DLSNGSEEDN LPNTQLDLLG 
1001 EYRIQEPNRE NSFHSSIEAI WEECKEIVKA SSKKSHQIEE 
1051 EVKGYKDENN RLKEKEHKNQ DDLLKEKETL IQQLKEELQE 
1101 HVVEGKRALS ELTQGVTCYK AKIKELETIL ETQKVERSHS 
1151 ESIILKLERN LKEFQEHLQD SVKNTKDLNV KELKLKEEIT 
1201 HLLQLKEEEE ETNRQETEKL KEELSASSAR TQNLKADLQR 
1251 KLTDAKKQIK QVQKEVSVMR DEDKLLRIKI NELEKKKNQC 
1301 IQQLKEQLNN QKVEEAIQQY ERACKDLNVK EKIIEDMRMT 
1351 QDQVLEAKLE EVERLATELE KWKEKCNDLE TKNNQRSNKE 
1401 LTNLQDELQE SEQKYNADRK KWLEEKMMLI TQAKEAENIR 
1451 RERFFKQQNE MEILTAQLTE KDSDLQKWRE ERDQLVAALE 
1501 VQKDNEIEQL KRIISETSKI ETQIMDIKPK RISSADPDKL 
1551 ISRNKIEDGS WLDSCEVST ENDQSTRFPK PELEIQFTPL 
1601 GCTTPVTVEI PKARKRKSNE MEEDLVKCEN KKNATPRTNL 
1651 SVKKEQKVAI RPSSKKTYSL RSQASIIGVN LATKKKEGTL 
1701 PSILQSKAKK IIETMSSSKL SNVEASKENV SQPKRAKRKL 
1751 ISGQVILMDQ KMKESDHQII KRRLRTKTAK 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_35b4, frame 3 

TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds., N = 1, Score = 3743, P •» 0 
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FSLVAPNTEA 
LKEPQCILGR 
LKGQSRLIFT 
MNLKPHRSRE 
TNSLNISEFE 
SSKFQKRKML 
FTKLNNASSR 
QNEGERLRET 
QSFFNGKGKI 
KLFGPVKSSQ 
NAEETQNVET 
KKEKLTLEFK 
I FKDLVGKCO 
LIKTKEELKK 
NEFQNLKSHM 
RKRVNENELQ 
QENNEGLRAF 
KEVQQIQSNY 
QIKLMHTKID 
NDYLVSKQVK 
LEQQIEKLQA 
KNVTLDVQIQ 
AKLEQDILEK 
QLTNNLQDMK 
KEEDYADLKE 
SQELDMKQRT 
LEEQEQTQVE 
HENNTDVLGK 
NKEMKKYAED 
IQLKALISSN 
QTEPLSTSFE 
QPNKMAVKHP 
KFPISDDRNS 
QKFGDFLQHS 
YTSEISSPID 
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PIR:A36881 MPM2-reactive phosphoprotein 1 - human (fragment), N « 2, 
Score - 2808, P = 2.5e-294 

TREMBL:AF070672_1 product: "rabkinesin6" ; Homo sapiens rabkinesin6 
mRNA, complete cds., N - 2, Score = 680, P « 2.6e-99 



>TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 
Length - 753 

HSPs: 

Score = 3743 (561.6 bits), Expect = 0.0e+00, P - 0.0e+00 
Identities = 752/753 (99%), Positives = 753/753 (100%) 

Query: 1028 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 1087 

VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 
Sbjct: 1 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 60 

Query: 1088 LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 
Sbjct: 61 LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 120 

Query: 1148 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 1207 

LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 
Sbjct: 121 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 180 

Query: 1208 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 1267 

EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 
Sbjct: 181 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 240 

Query: 1268 VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 1327 

VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 
Sbjct: 241 VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 300 

Query: 1328 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 1387 

NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 
Sbjct: 301 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 360 

Query: 1388 NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 1447 

NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 
Sbjct: 361 NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKMLEEKMMLITQAKEAENIRNKEMKKY 420 

Query: 1448 AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 1507 

AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 
Sbjct: 421 AE DRERF FKQQN EME I LT AQLT E KDS DLQKWRE E RDQL V AALE I QL KAL ISSKVQKDNEI 480 

Query: 1508 EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 1567 

EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPIiSTSFEISRNKIEDGSVVLDSCE 
Sbjct: 481 EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 540 

Query: 1568 VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNEMEEDLVK 1627 

VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTV+IPKARKRKSNEMEEDLVK 
Sbjct: 541 VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVK 600 

Query: 1628 CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 1687 

CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 
Sbjct: 601 CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 660 

Query: 1688 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 1747 

GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 
Sbjct: 661 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 720 

Query: 1748 PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 1780 

PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 
Sbjct: 721 PIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 753 

Score = 197 (29.6 bits), Expect » 2.1e-ll, P = 2.1e-ll 
Identities - 114/542 (21%), Positives « 253/542 (46%) 

Query: 692 IKTKEELKKRENESDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHM- 750 

+K + + E + I++L+ K +N R+KE + ++D + £ + L + 
Sbjct: 1 VKASSKKSHQI EELEQQIEKLQAEVKGYKDENNRLKEKEH — KNQDDLLKEKETLIQQLK 58 

Query: 751 ENTFKCNDKADTS-SLIINNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAK — 807 

E + N D ++ K +E + K+KI E + + E + + AK 

Sbjct: 59 EELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKI -RELET ILETQKVERSHSAKLE 117 

Query: 808 KGSIHVSSAITEDQKKSEEVRPNIAE-IEDIRVLQENNEGLRAFLLTIENELKNEK 862 
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+ + S I + ++ +E + ++ + + + + + L L+ + + N L++ K 

Sbjct: 118 QDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQ 177 

Query: 863 — EEKAELNKQIVH-FQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEE 919 

EE+ E N+Q ++ELS S + L ++Q+ + +Y A+L K K + ++ 
Sbjct: 178 LKE E EEETN RQ ETEKL KEELS AS SARTQNLKADLQRKEEDY ADL KEKLTDAKK 230 

Query: 920 KIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQISNIDLLNLRDLSNGSEE 978 

+ 1 ++ E+ S+ + + KL+ KI+EL + + SQ +D+ R + E+ 

Sbjct: 231 QIKQVQKEV SVMRD--EDKLLRIKINELEKKKNQCSQ — ELDMKQ-RTIQQLKEQ 280 

Query: 979 DNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAIWEECKEIVKASSKKSHQI 1038 

N N +++ Y + K+ ++E E+ ++E + E + K ++ 

Sbjct: 281 LN — NQKVEEAIQQY — ERACKDLNVKEKIIED-MRMTLEEQEQTQVEQDQVLEAKLEEV 335 

Query: 1039 EELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEELQEKNVT 1094 

E L ++EK + + + +NN+ KEH+N D+L + L +L+E Q+ N 
Sbjct: 336 E RL ATEL E KW K EKCN D L ET KNNQRS N KEH ENNT D VLG KLT N LQDE LQE S EQK YNA DRKKW 395 

Query: 1095 LDVQIQHVVEGKRA LSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

L++++ + KA ++ + + + E+E IL Q E+ + ++ 

Sbjct: 396 LEEKMMLITQAKEAENIRNKEMKKYAEDRERFFKQQNEME-ILTAQLTEKDSDLQKWRE- 453 

Query: 1148 LEKESI I LKLERNLKEFQEHLQDSVKNTKDLNVKELK-LKEEITQLTNNLQDMKHLLQLK 1206 

E++ ++ LE LK + +V+ KD +++LK + E +++ + D+K + 
Sbjct: 454 - ERDQLV AAL E I QL KAL ISSNVQ— KDNEIEQLKRI ISETSKIETQIMDIK PKR 504 

Query: 1207 EEEEETNRQETEKLKEELSASSARTQN 1233 

+ ++ +TE L S + ++ 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIED 531 

Score = 186 (27.9 bits), Expect = 3.2e-10, P = 3.2e-10 
Identities - 131/674 (19%), Positives = 294/674 (43%) 

Query: 673 LELKFNQIKAELAKTKGELIKT-KEELKKRENESDSLIQELETSNKKIITQNQRIKELIN 731 

L+ K ++ + +L K K LI+ KEEL+++ D IQ + + + Q + 
Sbjct: 35 LKEKEHKNQDDLLKEKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKA 94 

Query: 732 IIDQKEDTINEFQNL-KSHMENTFKCNDKADTSSLIINNKLICNETVEVPKDSKSKICSE 790 

I + E TI E Q + +SH + D + S+I+ + EE +DS 
Sbjct: 95 KIKELE-TILETQKVERSHSAKLEQ— DI LEKESI I LKLERNLKEFQEHLQDSt VKN 147 

Query: 791 RKRVNENELQ-QDEPPAKKGSIHVSSAITEDQKKSEEV-RPNIAEI-EDIRVLQENNEGL 847 

K +N EL+ ++E ++ + + +++ EE R ++ E++ + L 

Sbjct: 148 TKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNL 207 

Query: 848 RAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQI QSNYDI 902 

+A L E + + KE+ + KQI Q+E+S+ ++ L ++ ++ Q + ++ 

Sbjct: 208 KADLQRKEEDYADLKEKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL 267 

Query: 903 AIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQI 961 

+ + +Q+ K Q +K+ + +EA + + I+M ++E +T Q+ 

Sbjct: 268 DMKQRTIQQLKEQLNNQKVEEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQV 327 

Query: 962 SNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRI — QEPNRENSFHSSIEA 1019 

L + L+ E+ L+ N + + + N ++ S + 

Sbjct: 328 LEAKLEEVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQK 387 

Query: 1020 IWEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQ — DDLLKEK 1077 

+ K+ ++ Q +E E K E+K Y ++ R +++++ + L EK 

Sbjct: 388 YNADRKKWLEEKMMLITQAKEAENIRNK EMKKYAEDRERFFKQQNEMEILTAQLTEK 444 

Query: 1078 ETLIQQLKEELQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVER 1137 

++ +Q+ +EE + L++Q++ ++ + + ++ ++ET + K +R 

Sbjct: 4 45 DSDLQKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKR 504 

Query: 1138 SHSAKLEQDI LEKESI ILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEITQLT 1193 

SA ++ E S ++ RN E + DS +N + + +L+ + T L 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQ 564 

Query: 1194 NNLQDMKH LLQLKEEEEETNRQETEKLKEEL-SASSARTQNLKADLQRKEEDYADLK 1249 

N +KH + + + ++++ +++E+L + + + +L+ D + 

Sbjct: 565 PNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSS 624 

Query: 1250 EKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL-DMKQRTIQQLKEQL 1308 

K + K 1+ K+ +R + + I +N KKK Q+ D Q + L+ + 
Sbjct: 625 VK-KEQKVAIRPSSKKTYSLRSQASI — IGVNLATKKKEGTLQKFGDFLQHSPSILQSKA 681 

Query: 1309 NNQKVEEAIQQYERACKDLNVKEKIIEDMR 1338 

+K+ E+ + + + + KE + + R 
Sbjct: 682 — KKIIETMSSSKLSNVEAS-KEMVSQPKR 708 
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Score = 165 (24.8 bits), Expect = 5.8e-08, P « 5.8e-08 
Identities - 140/626 (22%), Positives - 271/626 (43%) 



Query: 


536 


VEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEK- 


594 






+EELE E E K +D + L+E + H+ + LL E L ++L E +EK 




Sbjct: 


11 


IEELEQQIEKLQAEVKGY-KDENNRLKEKE HKNQDDLLKEKETLIQQLKEELQEKN 


65 


Query: 


595 


LTLEFKIREEVT QE FTQYW AQRE AD FKE- -TLLQEREI LE EN A E R RL A I FK DLVG 


647 






+TL+ +1+ V E TQ +A KE T+L+ +++ E + +L +D++ 




Sbjct: 


66 


VTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV-ERSHSAKLE — QDILE 


122 


Query: 


648 


KCDT REEAAKOI CATKVETEEATACLELKFNQI KAELAKTKGELI KTKEELKKRENE 


704 






K E K+ ++ + T L +K ++K E+ + L K L+ +E E 




Sbjct: 


123 


KESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEE 


182 


Query: 


705 


SDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSS 


764 






++ QE -E +++ + R + L + +KE+ ++ + + K K + S 




Sbjct: 


183 


EETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQK-EVSV 


241 


Query: 


765 


LIINNKLICNETVEVPKDSKSKICSERKRVKENELQQDEPPAKKGSIHVSSAITEDQKKS 


824 






+ +KL+ + E+ K K CS+ + + +QQ + V AI + ++ 




Sbjct: 


242 


MRDEDKLLRIKINELEK — KKNQCSQELDMKQRTIQQLKEQLNNQK — VEEAIQQYERAC 


297 


Query: 


825 


EEVRPNIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEK 


884 






+++ IED+R+ E E + + + L+ + EE L ++ ++++ + E 




Sbjct: 


298 


KDLNVKEKIIEDMRMTLEEQEQTQ VEQDQVLEAKLEEVERLATELEKWKEKCNDLET 


354 


Query: 


885 


KNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNE-IETATRSITN N 


938 






KN S + + ++N D+ + +L + + QE E+K + +E IT N 




Sbjct: 


355 


KNNQRSNK — EHENNTDV-LGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAEN 


411 


Query: 


939 


VSQIKLMHTKIDELRTLDSVSQISNIDL-LNLRD--LSNGSEEDNLPNTQLDLLGNDYLV 


995 






+ ++ D R +++ + L +D L EE + L++ + 




Sbjct: 


412 


I RNKEMKKYAEDRERFFKQQNEMEI LTAQLTEKDSDLQKWREERDQLVAALEI QLKALI S 


471 


Query: 


996 


SKQVKEYRIQEPNRENSFHSSIEA-IWE-ECKEIVKASSKKSHQIEELEQQIEKLQAEVK 


1053 






S K+ I++ RSSIEI+ + KIA KQEL E + +++ 




Sbjct: 


472 


SNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKL-QTEPLSTSFEISRNKIE 


530 


Query: 


1054 


GYKDENNRLKEKEHKNQDDLLKEKE TLIQQLKEELQEKNVTLDVQIQHWEGKRA 


1108 






+ + +Q + E T +Q K ++ T V ++ KR 




Sbjct: 


531 


DGSWLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRK 


590 


Query: 


1109 


LSELTQG-VTCYKAKIKELETILETQ-KVERSHSAKLEQDILEKES 1152 








+ E+ + V C K T L+ +R+ S K EQ + + S 




Sbjct: 


591 


SNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPS 636 




Score 


- 143 


(21,5 bits), Expect « 1.3e-05, P « 1.3e-05 





Identities - 164/684 (23%), Positives - 304/684 (44%) 

Query: 295 QKRKMLR-LSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASS 349 

+K +++ L ++++ + D+Q V + K A L G+ +L 
Sbjct: 4 9 EKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV 108 

Query: 350 -RSHSI-FTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGE-RLRETGNINTS 406 

RSHS IL+E + + E LS+KNE +L+E T+ 

Sbjct: 109 ERSHSAKLEQDILEKESIILKLERNLKEFQE-HLQDSVKNTKDLNVKELKLKEEITQLTN 167 

Query: 407 LLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDET 466 

L K + LK E+ +Q + +L+ N K + + Y E 

Sbjct: 168 NLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADL QRKEEDYADLKEK 224 

Query: 467 LNVXKFSAIAQKVCVPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSL 526 

L K I Q V ++ +DKL +K ++ + N S+ L++K+ TI 
Sbjct: 225 LTDAK- KQI KQ-VQKEVS VMRDEDKLLR- I KINE-LEKKKNQCSQELDMKQRT IQQLKEQ 280 

Query: 527 EDLMEDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLKK 585 

+ + E+ +++ E A + NV+ K++ ED+ TLEE + + E+ ++L+ +E++++ 
Sbjct: 281 LNNQKVEEAIQQYERACKDLNVKEKII-EDMRMTLEEQEQ--TQVEQDQVLEAKLEEVER 337 

Query: 586 KLIN-EK-KEKLT-LEFKIREEVTQEFTQYWAQREADFKETLLQEREILEE NAERR 638 

EK KEK LE K + +E + K T LQ+ E+ E NA+R+ 

Sbjct: 338 L AT EL E KWK E KCN DL ET KNNQRS N KE H EN NTDVLGKLTKLQD-ELQESEQKYNADRK 393 

Query: 639 LAI FKDLVGKCDTREEAAKDICATKVETEEATACLELKFNQI KAELAKTKGELI KTKE EL 698 

+ + ++ T+ + A++I K E ++ E F Q + E+ +L + +L 

Sbjct: 394 KWLEEKMM— LITQAKEAENI-RNK-EMKKYAEDRERFFKQ-QNEMEILTAQLTEKDSDL 448 

Query: 699 KKRENESDSLIQELETSNKKI ITQN-QR 1 KEL I N 1 1 DQKEDT I N EFQN LKSHMENT F 754 

+K E D L+ LE K +1+ N Q+ I++L II + + ++K ++ 
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Sbjct: 449 QKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSA 508 

Query: "755 KCNDKADTSSLIINNKLICN--ETVEVPKDSKSKICSERK RVNENELQ-QDEP— PA 806 

DK T L + ++ N E V DS ++ +E R + EL+ Q P P 

Sbjct: 509 D-PDKLQTEPLSTSFEISRNKIEDGSVVLDS-CEVSTENDQSTRFPKPELEIQFTPLQPN 566 

Query: 807 KKGSIH-- VSSAITEDQKKSEEVRPNIAEIEDIRVLQENNEGLRA— FLLTIENELKNE 861 

K H +++T K+++NE+++ + N R F+++ + 
Sbjct: 567 KMAVKH PGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVK 626 

Query: 862 KEEKAEL NKQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQE 918 

KE+K + +K+ + + S+ NL K+ +Q D + +SK ++ 

Sbjct: 627 KEQKVAIRPSSKKTYSLRSQASI IGV-NLATKKKEGTLQKFGDFLQHSPSILQSKAKKI I 685 

Query: 919 EKIM — KLSNEIETATRSITNNVSQIKLMHTKI — DELRT-LDSVSQISNID 965 

E + KLSN +E + NVSQ K K+ E+ + +D Q+ +D 

Sbjct: 686 ETMSSSKLSN-VEASKE NVSQPKRAKRKLYTSEISSPIDISGQVILMD 732 

Score - 133 (20.0 bits), Expect = 1.6e-04, P - 1.6e-04 
Identities =• 94/426 (22%), Positives ■» 188/426 (44%) 

Query: 527 EDLM-EDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLK 584 

+DL+ E E L+++L+ + +NV LD + +E +A + I++L+ 

Sbjct: 44 DDLLKEKETLIQQLKEELQEKNVT LDVQIQHVVEGKRALSELTQGVTCYKAKIKELE 100 

Query: 585 KKLINEKKEKLTLEFKIREEVTQ-EFTQYWAQREA-DFKETLLQEREILEENAERRLAIF 642 

L +K E+ + K+ +++ + E +R +F+E L + ++ + L + 

Sbjct: 101 TILETQKVER-SHSAKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKL- 158 

Query: 643 KDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRE 702 

K+++ +K+ K E EE + ++K EL+ + K +L+++E 

Sbjct: 159 KEEITQLTNNLQDMKHLLQLKEEEEETN-—RQETEKLKEELSASSARTQNLKADLQRKE 215 

Query: 703 NESDSLIQELETSNKKIITQNQRIKELINIIDQK-EDTINEFQNLKSHMENTFKCNDKA- 760 

+ L ++L T KKIQQ+ ++ D+ INE + K+ + 

Sbjct: 216 EDYADLKEKL-TDAKKQI KQVQKEVS VMRDEDKLLRI KINELEKKKNQCSQELDMKQRT I 274 

Query: 7 61 DTSSLI INNKLICNETVE VPKDS--KSKICSE-RKRVNENE LQQDEPPAKKGS 810 

+KN+ + E ++ KD K KI + R + E E ++QD+ K 

Sbjct: 275 QQLKEQLNNQKV-EEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLE 333 

Query: 811 IHVSSAITEDQKKSEEVRP-NIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELN 869 

V TE +K E+ + ENN + L +++EL+ E E+K + 

Sbjct: 334 -EVERLATELEECWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQ-ESEQKYNAD 391 

Query: 870 KQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNEIE 929 

++ ++++ L +T +KE + I++ + K E E+ K NE+E 

Sbjct: 392 RK-KWLEEKMML ITQAKEAEN I RNK EMKKYAEDRERFFKQQNEME 435 

Query: 930 TATRS ITNNVSQI KLMHTKI DEL 952 

T +T S ++ + D+L 
Sbjct: 436 ILTAQLTEKDSDLQKWREERDQL 458 

Pedant information for DKFZphtes3_35b4 , frame 3 



Report for DKFZphtes3_35b4 . 3 

[LENGTH) 1780 

[MW] 206176.77 

[pi] 5.60 

[HOMOLJ TREMBL:U93121_1 product: "M-phase phosphoprotein-l M ; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 0.0 

[FONCAT] 30.10 nuclear organization (S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YEL061c] 2e-37 

[ FUNCAT ] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YEL061c) 2e-37 

tFUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

7e-30 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-30 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 11.01 stress response (S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPR141cJ 3e-23 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YPRl41c] 3e-23 

[ FUNCAT J 06.10 assembly of protein complexes [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YPR141C] 3e-23 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) (S. cerevisiae, YKR095w] le-21 
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I FUNCAT ] 


99 unclassified proteins [S. cerevisiae, YLR309c] 6e-20 


( FUNCAT ] 


03.04 budding, cell polarity and filament formation (S. cerevisiae, YHR023w 


MYOl - myosin-l isoform) 4e-19 


(FUNCAT J 


03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-l isoform] 4e-19 


(FUNCAT] 


03.19 recombination and dna repair [S. cerevisiae, YNL250w] le-15 


[FUNCAT] 


1 genome replication, transcription, recombination and repair (M. 


jannaschii, 


MJ1322] 2e-14 


(FUNCAT] 


30.13 organization of chromosome structure (S. cerevisiae, YDR285w] 2e-09 


I FUNCAT j 


09.04 biogenesis of cytoskeleton [S. cerevisiae, YKL179c] 3e-09 


I FUNCAT ] 


09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] 2e-07 


[FUNCAT] 


03.01 cell growth [S. cerevisiae, YNL079c] 2e-07 


F FUNCAT 1 


08 99 othpr intraeellular-transDort activities fS rprpvisiap YNL079cl 


2e-07 




[ FUNCAT ] 


03.22.01 cell cycle check point proteins [S. cerevisiae, YGL086w] le-06 


f FUNCAT 1 


10 05 99 other oheromonp resnonse activities fS rprpui<tiap YHR158c1 


3e-06 


[ FUNCAT ] 


04 05 01 04 t ranscriot ional control TS cerevisiae YDR217cl 4p— 06 


[ FUNCAT ] 


98 classification not upf clear— cut fS cerevisiae Y.TRl^dcl 7e— 05 


[ FUNCAT ] 


05.04 translation (initiation, elongation and termination) (S. cerevisiae, 


YAL035w] 2e 


-04 


[ FUNCAT J 


r aeneral function prediction TM. "iannaschil . MJ125dl 0 001 


[ BLOCKS ) 


BL00387A 


[ BLOCKS ] 


BL00411H 


F BLOCKS 1 


BL00411G 


f BLOCKS 1 


BL0041 1 P 

DJJl/U 11*1 


r BLOCKS! 


RT.findllR Kinp^i n motor fiomain nrotPins 


f RT OTKQ 1 
[ DlAA*Iw J 




r BLOCKS 1 


RTiOOilllP Kinesin motor Homain nrotpins 


L OLiVUfw J 


DLuu^iio nines j.n motor uonicij.n prot.ej.ns 


r RT.nrTf q i 


ojjUUti.J.n Mnesin motor aouiain pi.otcj.n5 


r cpfiDi 

I OV-Uf J 


qlxiui 1 t\xnesj.n it\ai macius jiorvegxcus/ ze do 


[ SCOP] 


o^Liuao 1 1 iu j . s . i< 1 1 cupoiuyosin lioooiL (urytto la^us curiJ.cuJ.usj ie us 


[ SCOP] 


UJJiaL jt£.j,±.mj, t t [DaM;i O jrcdsu \ ja^.wncii. winy L,t; 


( EC] 




( PIRKW) 


tiiicI 011*5 /to-9*7 


[ PIRKW] 


pnospnot iTaiis ceir asc js id 


[ PIRKW] 


uUpj.JiCatJ.Oii Qc £U 


[ PI RKW] 


^•i t" nil 1 ino IP 


[ PIRKW] 


tanripm rpoeat 4p— ?5 ^ 


[ PIRKW] 


Vi o'h 0 rrtH i mo r "^o — 5ft 

JlCUCiUUXULCi, JC 


[ PIRKW] 


endocytosis le~23 


[ PIRKW] 


heart le-17 ■ 


[ PIRKW] 


transmembrane protein 2e~ 28 


[PIRKW] 


serine/threonine-specif ic protein kinase 3e-16 


[PIRKW] 


zinc finger le~ 23 


[ PIRKW] 


q ii l- f a r*c» ant - *1 nor* 

ICiwC OilUXlJCil 1W 


[ PIRKW] 




[PIRKW] 


mo t ja 1 hi nHi nn 1 o— 7^ 

IUCLOiL AJXiiU XiiVJ IC *J 


[PIRKW] 


muscle contraction 4e — 24 


[PIRKW] 


hgterotet ramer 4e — 24 


[PIRKW] 


acetylated amino end 2e—19 


[ PIRKW] 


actin hindino 5p— 25 


[PIRKW] 


mi tosis 3e~58 


[PIRKW] 


microtubule binding 3e~58 


[PIRKW] 


ATP 3e-58 


(PIRKW] 


thick filament 4e-24 


[PIRKW] 


phosphoprotein 9e*29 


[PIRKW] 


leucine zipper le-12 


[PIRKW] 


skeletal muscle 8e-24 


(PIRKW] 


disulfide bond le-12 


[PIRKW] 


heterotrimer le- 29 


[PIRKW] 


calcium binding 6e-18 


[ PIRKW] 


alternative solieina 4e— 21 


[ PIRKW] 


P— looo 2e-63 


[PIRKW] 


coiled coil 3e— 58 


[ PIRKW] 


hpnfari rpopat" 1p — ?5 


(PIRKW] 


methylated amino acid 4e—24 


[PIRKW] 


peripheral membrane protein le-23 


(PIRKW] 


dimer le-12 


(PIRKW] 


cardiac muscle le-17 


(PIRKW) 


hydrolase 5e-25 


( PIRKW] 


microtubule 6e-15 


[PIRKW] 


muscle 7e-23 


[PIRKW] 


membrane protein 6e-20 


[PIRKW] 


GTP binding 8e-22 


(PIRKW] 


EF hand 6e-18 


[PIRKW] 


cell division le-25 


(PIRKW] 


cytoskeleton 4e-24 


[ PIRKW] 


hair 6e-18 


[PIRKW] 


Golgi apparatus Se-24 


[PIRKW] 


calmodulin binding le-23 
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[SUPFAM] unassigned Ser/Thr or Tyr-specific protein kinases 3e-16 

[SUPFAM] myosin motor domain homology 5e-25 

[SUPFAM] alpha-actinin actin-binding domain homology le-13 

(SUPFAM) kinesin-related protein KIP1 9e-27 

{SUPFAM] kinesin-related protein CIN8 4e-36 

[SUPFAMJ kinesin heavy chain 4e~24 

[SUPFAM] plectin le-13 

[SUPFAM] trichohyalin 6e-18 

[SUPFAM] kinesin-related protein KIF3 le-29 

[SUPFAM] kinesin-related protein KIF2 3e-20 

[SUPFAM] ribosomal protein S10 homology le-13 

[SUPFAM] giantin 8e-24 

[SUPFAM] protein kinase homology 3e-16 

[SUPFAM] protein kinase C zinc-binding repeat homology 2e-13 

[SUPFAM] kinesin-related protein unc-104 8e-26 

[SUPFAM] human early endosome antigen 1 le-23 

[SUPFAM] unassigned kinesin-related proteins le-28 

[SUPFAM] Mycoplasma genitalium hypothetical protein MG218 4e-17 

[SUPFAM] myosin heavy chain 5e-25 

[SUPFAM] conserved hypothetical P115 protein 4e-20 

[SUPFAM] centromere protein E 5e-24 

[SUPFAM] calmodulin repeat homology 6e-18 

[SUPFAM] kinesin-related protein KLP61F le-25 

[SUPFAM] hypothetical protein MJ0914 3e-12 

[SUPFAM] kinesin-related protein MKLP-1 2e-63 

[SUPFAM] pleckstrin repeat homology 8e-26 

[SUPFAM] hypothetical protein MJ1322 4e-13 

[SUPFAM] kinesin-related protein KIF1B 3e-28 

[SUPFAM] kinesin motor domain homology 2e-63 

[SUPFAM] kinesin-related protein KLPA 7e-25 

[SUPFAM] kinesin-related protein nodA le-12 

[SUPFAM] kinesin-related protein Eg5 5e-30 

[PROSITE] A1>P_GTP_A 1 

[PFAMJ Kinesin motor domain 

[KW] Irregular 

(KW] 3D 

(KW] LOW_COMPLEXITY 7.53 % 

[KW] COILED_COIL 19.78 % 



SEQ MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 

SEG 

COILS 

3kar- 

SEQ VCLRIRPFTQSEKELESEGCVHILDSQTWLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 

SEG 

COILS 

3kar- 

SEQ PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 

SEG 

COILS 

3kar- 

SEQ DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQI KEVTVHNDSDDTLYGSL 

SEG 

COILS 

3kar- 

SEQ TNSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKML 

SEG 

COILS 

3 kar- EEEEEEEEEEETTEEEETTTCC CCEE 

SEQ RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 

SEG 

COILS 

3kar- eeetttte-eeeettcceeeccggghhhhhhhhhhhhccttttchhhhhhceeeeeeeee 

SEQ QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

SEG 

COILS 

3kar- E— EETTTTCEEEEEEEEEECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

SEQ EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDETLNVLKFSAIAQKVC 

SEG 

COILS 

3kar- TTTT--TCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHH 

SEQ VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IREEVTQEFTQYWAQREADFKETLLQEREILEENAERRLAIFKDLVGKCDTREEAAKDIC 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKII 

SEG 

COI LS CCCCCCCCCCCCCCC 

3kar- 

SEQ TQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS CCCC 

3kar- 

SEQ QENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQIQSNY 

SEG xxxxxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- 

SEQ WEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ AKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMK 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG .xxxxxxxxxxxxxxxxxxx 

COILS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG 

COILS CCCCCCCCCCCC 

3kar- 

SEQ ERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG xxxxxxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIR 

SEG 

COILS CC 

3kar- 

SEQ NKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSN 

SEG 
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COILS 

3kar- 

SEQ VQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGS 

SEG 

COILS 

3kar- 

SEQ WLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNE 

SEG 

COILS 

3kar- 

SEQ MEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- 

SEQ YTSEISSPIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 

SEG 

COILS 

3kar- 



Prosite for DKFZphtes3_35b4 . 3 



PS00017 152->160 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds phks . 

R+RP+ + E++ + +V + ++++ ++ + ++ 
Query 64 RIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQK 112 

HMM FtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTIFAYGQTGSGKTYTM 

F+F +VF++++TQ++ +++ + V+D+++G IF+YG T SGKTYT 
Query 113 FSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTF 162 

HMM MGpggehPDHmGI I PRcCHDIFdr Idk f qekDhdFW 

G +++GI+PR+++ +FD++ + +++ 

Query 163 QG TEENIGILPRTLNVLFDSLQERL-YTKMNLKPHRSREYLRLSSE 207 

HMM 

Query 208 QEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESIKDYE 257 

HMM hVkCSYMEI YNEelYDLLCPnP . . . qhMkpLnlHEHPN 

+V +S++EIYNE+IYDL +P++ Q++K L++ + + 
Query 258 QANLNMANSIKFSVWVSFFEIYNEYI YDLFVPVSSKFQKRKMLRLSQDVK 307 

HMM MGpYVqGCTEf HVcSYeDachWIWqGnknRHVAaTnMNdhSSRSHtlFTI 

++++++ V +A +++ +G K+ VA T++N SSRSH+IFT+ 
Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTV 357 

HMM HVeQrHk . qcdehvcHSKMNLVDLAGSERvnrTGAEGQRlKEGcNINqSL 

++ Q + + +++S ++L DLAGSER+ +T+ EG RL+E +NIN SL 
Query 358 K I LQ I E D S EMS RVIRVSELSLC DL AG S E RTMKT QNEGERLRET G N I NT S L 407 

HMM ttLGnVInaLaDgqTKYraYgghgHI PYRDSKLTWlLQDSLGGNcKTcMI A 
+TLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI+ 
Query 408 LTLGKC I NVLKNSE KSKFQQHVPFRESKLTHYFQSFFNGKGKICMIV 454 

HMM CIWPadWNYEETLSTLRYAdRAKnlkNkPQINEDPca* 

+1+ + Y+ETL++L++ + A+++ + ++N+++++ 
Query 455 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 491 
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DKFZphtes3_35b5 



group: metabolism 

DKFZphtes3_35b5 encodes a novel 4 66 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase {V-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of specialized cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E. The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 



strong similarity to bovine vacuolar ATPase (EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-tenninus 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2043 bp 

Poly A stretch at pos . 2033, polyadenylation signal at pos. 2012 



1 GGCGGCCATG GCGACGGCTC GAGTGCGGAT GGGGCCGCGG TGCGCCCAGG 
51 CGCTCTGGCG CATGCCGTGG CTGCCGGTGT TTTTGTCGTT GGCGGCGGCG 
101 GCGGCGGCGG CAGCGGCGGA GCAGCAGGTC CCGCTGGTGC TGTGGTCGAG 
151 TGACCGGGAC TTGTGGGCTC CTGCGGCCGA CACTCATGAA GGCCACATCA 
201 CCAGCGACTT GCAGCTCTCT ACCTACTTAG ATCCCGCCCT GGAGCTGGGT 
251 CCCAGGAATG TGCTGCTGTT CCTGCAGGAC AAGCTGAGCA TTGAGGATTT 
301 CACAGCATAT GGCGGTGTGT TTGGAAACAA GCAGGACAGC GCCTTTTCTA 
351 ACCTAGAGAA TGCCCTGGAC CTGGCCCCCT CCTCACTGGT GCTTCCTGCC 
401 GTCGACTGGT ATGCAGTCAG CACTCTGACC ACTTACCTGC AGGAGAAGCT 
451 CGGGGCCAGC CCCTTGCATG TGGACCTGGC CACCCTGCGG GAGCTGAAGC 
501 TCAATGCCAG CCTCCCTGCT CTGCTGCTCA TTCGCCTGCC CTACACAGCC 
551 AGCTCTGGTC TGATGGCACC CAGGGAAGTC CTCACAGGCA ACGATGAGGT 
601 CATCGGGCAG GTCCTGAGCA CACTCAAGTC CGAAGATGTC CC AT AC AC AG 
651 CGGCCCTCAC AGCGGTCCGC CCTTCCAGGG TGGCCCGTGA TGTAGCCGTG 
701 GTGGCCGGAG GGCTAGGTCG CCAGCTGCTA CAAAAACAGC CAGTATCACC 
751 TGTGATCCAT CCTCCTGTGA GTTACAATGA CACCGCTCCC CGGATCCTGT 
801 TCTGGGCCCA AAACTTCTCT GTGGCGTACA AGGACCAGTG GGAGGACCTG 
851 ACTCCCCTCA CCTTTGGGGT GCAGGAACTC AACCTGACTG GCTCCTTCTG 
901 GAATGACTCC TTTGCCAGGC TCTCACTGAC CTATGAACGA CTCTTTGGTA 
951 CCACAGTGAC ATTCAAGTTC ATTCTGGCCA ACCGCCTCTA CCCAGTGTCT 
1001 GCCCGGCACT GGTTTACCAT GGAGCGCCTC GAAGTCCACA GCAATGGCTC 
1051 CGTCGCCTAC TTCAATGCTT CCCAGGTCAC AGGGCCCAGC ATCTACTCCT 
1101 TCCACTGCGA GTATGTCAGC AGCCTGAGCA AGAAGGGTAG TCTCCTCGTG 
1151 GCCCGCACGC AGCCCTCTCC CTGGCAGATG ATGCTTCAGG ACTTCCAGAT 
1201 CCAGGCTTTC AACGTAATGG GGGAGCAGTT CTCCTACGCC AGCGACTGTG 
1251 CCAGCTTCTT CTCCCCCGGC ATCTGGATGG GGCTGCTCAC CTCCCTGTTC 
1301 ATGCTCTTCA TCTTCACCTA TGGCCTGCAC ATGATCCTCA GCCTCAAGAC 
1351 CATGGATCGC TTTGATGACC ACAAGGGCCC CACTATTTCT TTGACCCAGA 
1401 TTGTGTGACC CTGTGCCAGT GGGGGGGTTG AGGGTGGGAC GGTGTCCGTG 
1451 TTGTTGCTTT CCCACCCTGC AGCGCACTGG ACTGAAGAGC TTCCCTCTTC 
1501 CTACTGCAGC ATGAACTGCA AGCTCCCCTC AGCCCATCTT GCTCCCTCTT 
1551 CAGCCCGCTG AGGAGCTTTC TTGGGCTGCC CCCATCTCTC CCAACAAGGT 
1601 GTACATATTC TGCGTAGATG CTAGACCAAC CAGCTTCCCA GGGTTCGTCG 
1651 CTGTGAGGCG TAAGGGACAT GAATTCTAGG GTCTCCTTTC TCCTTATTTA 
1701 TTCTTGTGGC TACATCATCC CTGGCTGTGG ATAGTGCTTT TGTGTAGCAA 
1751 ATGCTCCCTC CTTAAGGTTA TAGGGCTCCC TGAGTTTGGG AGTGTGGAAG 
1801 TACTACTTAA CTGTCTGTCC TGCTTGGCTG CCGTTATCGT TTTCTGGTGA 
1851 TGTTGTGCTA ACAATAAGAA GTACACGGGT TTATTTCTGT GGCCTGAGAA 
1901 GGAAGGGACC TCCACGACAG GTGGGCTGGG TGCGATCGCC GGCTGTTTGG 
1951 CATGTTCCCA CCGGGAGTGC CGGGCAGGAG CATGGGGTGC TTGGTTGTTT 
2001 CCTTCCTAAT AAAATAAACG CGGGTCGCCA TGCAAAAAAA AAA 



BLAST Results 



No BLAST result 



830 



WO 01/12659 



PCT/IB00/01496 



Medline entries 



95014142: 

A novel accessory subunit for vacuolar H(+)-ATPase from chromaffin 
granules . 

97215246: 

Identification of a rat brain gene associated with aging by 
PGR differential display method. 



Peptide information for frame 2 



ORF from 8 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 



1 MATARVRMGP RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VPLVLWSSDR 
51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLQEKLGA 
151 SPLHVDLATL RELKLNASLP ALLLIRLPYT ASSGLMAPRE VLTGNDEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA WAGGLGRQL LQKQPVSPVI 
251 HPPVSYNDTA PRILFWAQNF SVAYKDQWED LTPLTFGVQE LNLTGSFWND 
301 SFARLSLTYE RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
351 YFNASQVTGP SIYSFHCEYV SSLSKKGSLL "VARTQPSPWQ MMLQDFQIQA 
401 FNVMGEQFSY ASDCASFFSP GIWMGLLTSL FMLFIFTYGL HMILSLKTMD 
451 RFDDHKGPTI SLTQIV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hit3 for DKFZphtes3_35b5, frame 2 

TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus 
norvegicus C7-l"protein (C7-1) mRNA, complete cds., N = 1, Score » 
2088, P = 3.8e-216 

PIR:A55116 vacuolar ATPase (EC 3-6.1.-) chain Ac45 - bovine, N = 1, 
Score = 2011, P = 5.5e-208 

PIR:I54197 hypothetical protein - human, N = 1, Score = 1464, P = 
5.1e-150 



>TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus 

C7-1 protein (C7-1) mRNA, complete cds. 
Length =4 63 

HSPs: 



Score » 2088 (313.3 bits), Expect = 3.8e-216, P = 3.8e-216 
Identities = 408/463 (88%), Positives = 426/463 (92%) 



Query: 


4 


ARVRMG PRCAQALWRMPWLPV FLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTHEGH 


63 






+R+R G R A LW + LSL A AAA AAEQQVPLVLWSSDRDLWAP ADTHEGH 




Sbjct: 


8 


SRIRTGTRWAPVLW LLLSLVAV AAA VAAEQQVPLVLWSSDRDLWAPV ADTHEGH 


61 


Query: 


64 


ITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 


123 






ITSD+QLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 




Sbjct: 


62 


ITSDMQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 


121 


Query: 


124 


PSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYTASS 


183 




PSSLVLPAVDWYA+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLIRLPYTASS 




Sbjct: 


122 


PSSLVLPAVDWYAISTLTTYLQEKLGASPLHVDLATLKELKLNASLPALLLIRLPYTASS 


181 


Query: 


184 


GLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQLLQK 


243 




GLMAPREVLTGNDEVIGQVLSTL+SEDVPYTAALTAVRPSRVARDVA+VAGGLGRQLLQ 




Sbjct: 


182 


GLMAPREVLTGNDEVIGQVLSTLESEDVPYTAALTAVRPSRVARDVAMVAGGLGRQLLQT 


241 


Query: 


244 


QPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWNDSFA 


303 






Q SP IHPPVSYNDTAPRILFWAQNFSVAYKD+W+DLT LTFGV+ LNLTGSFWNDSFA 




Sbjct: 


242 


QVASPAIHPPVSYNDTAPRILFWAQNFSVAYKDEWKDLTSLTFGVENLNLTGSFWNDSFA 


301 


Query: 


304 


RLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGPSIY 


363 






LSLTYE LFG TVTFKFILA+R YPVSAR+WFTMERLE+HSNGSVA+FN SQVTGPSIY 
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Sbjct: 302 MLSLTYEPLFGATVTFKFIIASRFYPVSARYWFTMERLEIHSNGSVAHEWSQVTGPSIY 361 

Query: 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSPGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 
Sbjct: 362 SFHCEYVSSLSKKGSLLVTNV-PSLWQMTLHNFQIQAFNVTGEQFSYASDCAGFFSPGIW 420 

Query: 424 MGLLTSLFMLFI FTYGLHMILSLKTMDRFDDHKGPTISLTQIV 466 

MGLLT+LFMLFIFTYGLHMILSLKTMDRFDD KGPTI+LTQIV 
Sbjct: 421 MGLLTTLFMLFIFTYGLHMILSLKTMDRFDDRKGPTITLTQIV 463 



Pedant information for DKFZphtes3_35b5, frame 2 



Report for DKFZphtes3_35b5.2 



[LENGTH] 466 

[MW] 51621.44 

tpl] 5.73 

[HOMOLJ TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 

protein (C7-1) mRNA, complete cds. 0.0 

[PIRKW] hydrolase 0.0 

[PROSITEJ MYRISTYL 7 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2 PHOSPHO_SITE 7 

[PROSITE] TYR~PHOSPHO_SITE 1 

[ PROSITE] PKC_PHOSPHO SITE 8 

[PROSITE] ASN_GLYCOSYLATION 7 

[KW] SIGNAL_PEPTIDE 38 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 11.59 % 



SEQ MATARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTH 

SEG xxxxxxxxx 

PRO ccceeeecccchhhhhhhcccchhhhhhhhhhhhhhhhhccceeeecccccccccccccc 

MEM 

SEQ EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 

SEG 

PRD ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 

MEM 

SEQ DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

SEG xxxxxxxxxxxxxxx. . . 

PRD ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 

MEM 

SEQ ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQL 

SEG xxxxxxxxxxxxxxxxxxxx . . 

PRD cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 

MEM 

SEQ LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWND 

SEG 

PRD hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

MEM 

SEQ SFARLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 

SEG 

PRD hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

MEM 

SEQ SIYSFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSP 

SEG xxxxxxxxxx 

PRD ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 

MEM MMMMMM 

SEQ GIWMGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_35b5 .2 

PS00001 166->170 ASN_GLYCOSYLATION PDOC00001 
PS00001 257->261 ASNJ3LYCOSYLATION PDOC00001 
PS00001 269->273 ASN GLYCOSYLATION PDOC00001 
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PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS000O5 
PS00005 
PS00005 
PSO00O5 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PSO0008 
PS00008 
PS00008 



292->296 
299->303 
346->350 
353->357 
375->379 
3->6 
4B->51 
159->162 
205->208 
318->321 
331->334 
374->377 
445->44B 
48->52 
72->76 
94->98 
114->118 
159->163 
193->197 
255->259 
207->214 

102- >108 

103- >109 
200->206 
295->301 
314->320 
421->427 
425->431 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

AS N_GL YC OS Y L AT I ON 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO*~SITE 

PKC PHOSPHO~SITE 

PKC~PHOSPHO~SITE 

CK2~PHOSPHO~SITE 

CK2~PHOSPHO SITE 

CK2~PH0SPH0"SITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

TYR~PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35b5.2) 
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group: differentiation/development 

DKF2phtes3_35e21. 2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7 . 

Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, and 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. 



similarity to interleukin-7 precursor 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2095 bp 

Poly A stretch at pos . 2085, polyadenylation signal at pos. 2067 



1 GGATGAAAGT GATTTAATTC ATTTTTAGAA TTTTTTTTTT GTTTTGTTTT 
51 AGCAACATGC TGAACAACTA ATTTACTTTA AAAATAAGCC AGTTAAAACA 
101 AAGGACGCTA AGCCCAAGTG GGGGGCAATA TTAGTCAGGA TCTTTGGGGT 
151 CTAATTCCAG ACCAACTTTC AGAAGCACTT CTTTGTCTCT GTTCTCACCT 
201 CTGCTGTCCC TCTCTTCCCT CATCCCCTAA GAGAGACAAA GATAAAAGCC 
251 CACCTGCATC CCTAAGTCTT ACTGAGATCA GCCACCCCAG GGGAGAGAAA 
301 CTGGATCTAC TTACAGCCAC CCCCTGTTTC CATCCATATA CTTACTTCCC 
351 CCAATTTGCA TGTGATTATG GAAACAAGTC ATGCTCATGA AAGCAACTGT 
401 AAAATAAAAG GTTATGGAGT AGTTCAGCAA CTTCTTCACA GCCAGCTTTG 
451 TGGAGCTGGG GAGGACTTAG GGCCCATTGG AGTCTCTTAT GTGTACAGCT 
501 TCAGGGCTGT CCCTTTCAGT TTGATTTTAA GCAATGCCTC ACTTCATAGC 
551 TTAGGGGGTA AGGATTCCAT TCAGGTAGGT TGTCTAAAGG AACTAATGGG 
601 ACCTCTCAGT GAATTAGCTG ACCAGATTTT AGGAAATCTT TTTAATTTCT 
651 ATGATTTTCC TTCTCACATT TTGAAATGGT AAAATTGACT GGAAATAATT 
701 TTTCTTGGTG CCTTATTGGT TTTCCTTGCA AACCTTTCTC ATATTTTCTC 
751 ATGACCATTG CCAGTGACCA AGGCCCATGT GTGTGTTGTG TGTAATTGTG 
801 GGCATGTACA AGCTTAAATA ACGTGCCGAC AGCACTGTTT CAAAGTTGGT 
851 ATTCATTAGG CTGTTGCCTC CTGGGCTGGA GCTGCGCTAA TCCTGACACC 
901 GGCTGCCAGG AGAAAACCTC ATGGATCACA CACCAAACCT TAATAACAGC 
951 ATCCGTGACC TGCACTCTCC AGTACAGAAT GGGAACCCCA GAGCTAGGAA 
1001 ATGTAGTTGT ATATTTTAAT GAACTGCTAC CCCAGCCAAA GAAGCTTCTT 
1051 TCACTTTTGT GCTCTACAGA AAGCCCAAGG GGGGTAGGAG GGACAGAGCT 
1101 TTGAATAACT GCTTTCTAAC ACTAAATGTG GCCAACAGGA CAGAGCACAT 
1151 CACACGTATA GGCAGGTGTG AGGGACAGTG GCTAAGAATT GCCTGCTCCC 
1201 TCTGCATGCT CTTTCTTGTT TCCAAAGTCC AATCAAGTGA TCCTGGGAAA 
1251 CAAATCTGTC TGGATTGCGG AGGGTGGTTC TGAAAGAACT GCCAAGACGT 
1301 TAAAGAAGGG TGAAGAGTAG GCAGAATATA AGTAGCTAAC CTGAGTCAAG 
1351 ACTCTCAAAA GCTAGCAGCC TGATGACAAT AGGATTTATT TCAGCCAGGA 
1401 TAGTGTCTGT CTGTGAGTGC ATCATTTTAA GACAGTATGA CTTCATGTTG 
1451 TTACAAACTA TGTATAGTAT GTATGTTTTG TGGGTTGTAT ATATACATAA 
1501 TATATATTAT ATATATATAT GAGAGATTTG GTGACTTTTG ATACGGGTTT 
1551 GGTGCAGGTG AATTTATTAC TGAGCCAAAT GAGGCACATA CCGAGTCAGT 
1601 AGTTGAAGTC CAGGGCATTC GATACTGTTT ATGATTTCCA TATATGTATA 
1651 GTGCCTATCC CATGCTGTAG TCACTGTTAT GTTAAATCCA GAAGTTACAC 
1701 TAGAGCCAGC GATACTTTAT TTGTAGACAA TCAATTTGAA TCCATATGTT 
1751 ATTACTGGCA GATGATACAT GATTACAGTT CTGAATCTGT AACACTTACA 
1801 AAAGGAAACC CAGAGCAGCT TGATGAGTTT TTGTTTCTGC TTCGTTCCTG 
1851 GGAGTCAGTA GAAACAGCAG TTGTATGTGG TTATGTTAGT CTCAAGATAC 
1901 TTAATTTGTT GACCTTACTT CAGAAAAATT TTGTATGTAT TATATTTGTG 
1951 GGAAGGTAAA ATAATCATTT GAGATTTTTA TCAAATATGA AGATTAGTTA 
2001 TTTATGAAAA ACAAAGAAAT GTCTATTTTT CTTTGTTCCC AATTAATGTA 
2051 GATAAATTTT AAAATGCATT AAAGTAATGG TCCGGAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



89098903: 

Human interleukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 



Peptide information for frame 2 



ORF from 368 bp to 679 bp; peptide length: 104 
Category: similarity to known protein 



1 METSHAHESN CKIKGYGVVQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLAST P hits 

Entry B32223 from database PIR: 
interleukin-7 precursor (clone 1) - human 

Score =66, P - 7.0e-01, identities » 21/70, positives - 33/70 



Alert BLASTP hits for DKFZphtes3_35e21, frame 2 

PIR:B32223 interleukin-7 precursor (clone 1) - human, N = 1, Score = 
66, P o 0.72 

TREMBL : PADAL1 1 gene: "dall"; P.abies dall mRNA, N - 2, Score - 59, P 
« 0.77 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N = 1, Score = 
66, P = 0.79 

TREMBL:PRU76726_1 gene: "PrMADS3"; product: "MADS -box protein"; Pinus 
radiata MADS-box protein (PrMADS3) mRNA, complete cds., N - 2, Score = 
59, P - 0.94 

>PIR:B32223 interleukin-7 precursor (clone 1) - human 
Length - 133 

HSPs: 

Score = 66 (9.9 bits), Expect = 1.3e+00, P - 7.2e-01 
Identities = 21/68 (30%), Positives « 33/68 (48%) 

Query: 39 VSYVYSFRAVPFSLIL SNASLHSLGGK— DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L S+ + GK +S+ + + +L+ + E4 L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF— FKRHI 71 

Pedant information for DKFZphtes3_35e21, frame 2 

Report for DKFZphtes3_35e21 . 2 

[LENGTH} 104 

(MW] 11339.12 

[pi] 5.87 

[PROSITE] MYRISTYL 2 

[PROSITE] PKC_PHOSPHO SITE 1 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 

SEQ METSHAHESNCKIKGYGVVQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ 
PRD 



SLGGKDSIQVGCLKELMGPLSELADQILGNLFNFYDFPSHILKW 
cccccceeeccccccccccchhhhhhhhcccccccccccccccc 



Prosite for DKFZphtes3_35e21 .2 

PS00001 56->60 ASN_GLYCOS Y LAT I ON PDOC00001 

PS00005 44->47 PKC_PHOSPHO_SITE PDOC00005 

PS00008 63->69 MYRISTYL PDOC00008 

PS00008 89->95 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35e21 .2) 
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DKFZphtes3_35g6 
group: testes derived 

DKFZphtes3_35g6 encodes a novel 4 82 amino acid protein with high partial similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

strong similarity to R27216_ 1 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map^'lS" 
Insert length: 3177 bp 

Poly A stretch at pos. 3167, polyadenylation signal at pos. 3148 

1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 

51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 

101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 

151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 

201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 

251 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 

301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 

351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 

401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 

451 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 

501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 

551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 

601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 

651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 

701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 

751 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 

801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 

851 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 

901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 

951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 

1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 

1051 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 

1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 

1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 

1201 CT AC AG ATT A TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 

1251 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 

1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 

1351 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 

1401 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 

1451 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 

1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 

1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 

1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 

1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 

1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 

1751 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 

1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 

1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 

1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 

1951 AAGAGATGGG TCAGTATTCG TACAGAATTC TTATTAACTC AAATAACTAA 

2001 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 

2051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 

2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 

2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 

2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 

2251 AACTGCACCA CTATACCTGT CTCTCTGTGT GGGGGACACT GCTGATGATT 

2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 

2351 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 

2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 

2451 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 

2501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 

2551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 

2601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 

2651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 

2701 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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27 51 AAGATACAAA AAATTTTCAT 
2801 GAAGGTAGGT ATATTGGTGG 
2851 TTTTTCTATG GTAATGCTCT 
2901 ATCTATGGGA TGTGTGGTTC 
2951 CTGTAGTAAC CATTACAGAA 
3001 CAGAGATGAG TTAGTGTTTC 
3051 TGTTGTACTG AACAATTGAA 
3101 CAGAACTGTT TACTAACTTT 
3151 TAAATATATA TAT AT AT AAA 



CTAAAGTAAT ATTTCACTTT ATATTGTAAA 
CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 
TACGGATATA AGCCTCAGTT AAATGGAATT 
TGGTTAACTA AAAATTAACC AGTAAACACT 
AATACTTCTG CCTTAAAAAA TATGATATGC 
TTGACGTTGG AGACCTATAA ATGCCTCATC 
ACTGCATGCA GCCATAAAAG GGACAAGAAA 
GGGACATCCC CTGGAGTTTT TAAAAATAAA 
AAAAAAA 



BLAST Results 



Entry G37753 from database EMBL: 
SHGC-63477 Human Homo sapiens STS genomic. 
Score - 1627, p =• 3.0e-66, identities = 327/329 

Entry G37752 from database EMBL: 
SHGC-63476 Human Homo sapiens STS genomic. 
Score = 1578, P = 6.2e-64, identities - 320/324 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 84 bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 



1 MASLGPAAAG EQASGAEAEP GPAGPPPPPS PSSLGPLLPL QREPLYNWQA 
51 TKASLKERFA FLFNSELLSD VRFVLGKGRG AAAAGGPQRI PAHRFVLAAG 
101 SAVFDAMFNG GMATTSAEIE LPDVEPAAFL ALLRFLYSDE VQIGPETVMT 
151 TLYTAKKYAV PALEAHCVEF LTKHLRADNA FMLLTQARLF DEPQLASLCL 
201 DTIDKSTMDA ISAEGFTDID IDTLCAVLER DTLSIRESRL FGAVVRWAEA 
251 ECQRQQLPVT FGNKQKVLGK ALSLIRFPLM TIEEFAAGPA QSGILSDREV 
301 VNLFLHFTVN PKPRVEYIDR PRCCLRGKEC CINRFQQVES RWGYSGTSDR 
351 IRFTVNRRIS IVGFGLYGSI HGPTDYQVNI QIIEYEKKQT LGQNDTGFSC 
401 DGTANTFRVM FKEPIEILPN VCYTACATLK GPDSHYGTKG LKKVVHETPA 
451 ASKTVFFFFS SPGNNNGTSI EDGQIPEIIF YT 

BLASTP hits 

Entry AC005306_2 from database TREMBL: 

product: "R27216_l"; Homo sapiens chromosome 19, cosmid R27216, 
complete sequence. 

Score - 1298, P =» 1.9e-132, identities = 245/297, positives «= 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: ,, F38H4.7 ,, ; Caenorhabditis elegans cosmid F38H4 

Score = 1237, p = 5.6e-126, identities « 248/446, positives - 322/446 

Entry AC004678_1 from database TREMBL: 

product: "R34094_l"; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score ■> 555, P « 1.0e-53, identities - 112/137, positives = 123/137 



Alert BLASTP hits for DKFZphtes3_35g6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35g6, frame 3 



Report for DKFZphtes3_35g6.3 



[LENGTH] 482 

[MW] 52771.47 

[pi] 5.79 
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[HOMOL] TREMBL:AC005306_2 product: "R27216_1 ,, ; Homo sapiens chromosome 19, cosmid 

R27216, complete sequence, le-142 ~ 

[BLOCKS] BL01075D Acetate and butyrate kinases family proteins 

[SUPFAM] P02 domain homology 3e-08 

[SUPFAM] A55R protein middle region homology 5e-06 

[SUPFAM] A55R protein 5e-06 

I SUPFAM] A55R protein carboxyl- terminal homology 5e-06 

IPROSITE] MYRISTYL 6 

[PROSITE] CAMP PHOSPHO SITE 2 

[PROSITE] CK2_PHOSPHO__SITE 9 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 11.20 % 

SEQ MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SEQ FLFNSELLSDVRFVLGKGRGAAAAGGPQRIPAHRFVLAAGSAVFDAMFNGGMATTSAEIE 

SEG xxxxxxxxxxx 

PRD hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

SEQ LPDVEPAAFLALLRFLYSDEVQIGPETVMTTLYTAKKYAVPALEAHCVEFLTKHLRADNA 

SEG 

PRD ecccchhhhhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

SEQ FMLLTQARLFDEPQLASLCLDTIDKSTMDAISAEGFTDIDIDTLCAVLERDTLSIRESRL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

SEQ FGA VV RWAE AEC QRQQL P VT FGN KQKVLGKALSLI RFP LMT I EE F AAG P AQSG I LS D REV 

SEG 

PRD hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceeecccccccccccccchhhhh 

SEQ VNLFLHFTVNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRI RFTVNRRIS 

SEG 

PRD hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

SEQ IVGFGLYGSIHGPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

SEG 

PRD eeeccccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

SEQ VCYTACATLKGPDSHYGTKGLKKWHETPAASKTVFFFFSSPGNNNGTSIEDGQIPEIIF 

SEG xxxxxx 

PRD ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

SEQ YT 
SEG 

PRD CC 



Prosite for DKFZphtes3_35g6. 3 



PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 



394->398 
466->470 
357->361 
387->391 

54->57 
1S4->157 
234->237 
296->299 
348->351 
406->409 
428->431 

14->18 

54->58 
115->119 
206->210 
217->221 
234->238 
281->285 
296->300 
468->472 
430->437 

80->86 
110->116 
365->371 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP_PHOS PHO_S I T E 
CAMP_PHOS PHO_S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_ PHOS PHO_S ITE 
PKC~PHOS PHO_S ITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH0"SITE 
CK2_PH0SPH0_SITE 
CK2 PHOS PHO_S ITE 
CK2~PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
CK2~PHOSPHO_SITE 
CK2^PHOSPHO SITE 
CK2_PH0SPH0"SITE 
TYR_PHOSPHO SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 392->398 MYRISTYL PDOC00008 
PS00008 402->4O8 MYRISTYL PDOC00008 
PS00008 4 63->469 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35g6. 3) 
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DKFZphtes3_35kl6 



group: metabolism 

DKFZphtes3_35kl6 encodes a novel 666 amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases . 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found in several CoA synthetases, such as acetate-CoA ligase (EC 6.2.1.1), long- 
chain-fatty-acid-CoA ligase (EC 6.2.1.3), 

bile acid-CoA ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 



similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2520 bp 

Poly A stretch at pos. 2510, polyadenylation signal at pos. 2490 



1 CAGATGTCCC AGCTCCAGTG 
51 TGACTGGAAC CCCAAAGACT 
101 ATGAATAAAA CAGAAGTTAC 
151 AGAAGTCCTT CTGAGGCTAT 
201 TGACCATCCC TGAATTTTTT 
251 CCAGCCCTCG CATCCAAGAA 
301 CCAGTACTAT GAGGCTTGTC 
351 GTTTGGAGCG TTTCCACGGA 
401 TGGTTTATCA CTGCTGTTGG 
451 TATTTATGCC ACCAACTCTG 
501 CCAAAGTGAA CATCTTGCTG 
551 CTTTCGATTC CACAGAGCAG 
601 CAGACTGCCA ATGAAGAAGA 
651 TGGAACTTGG CAGAAGTATC 
701 AGCCAGAAGG CGAATCAATG 
751 AGGCATACCC AAGGGAGTGA 
801 CAGGAGCAGT GACAAAGGAC 
851 GTTAGCTACC TCCCACTCAG 
901 GGTACCCATA AAGATTGGGG 
951 TCAAGGGCAC CTTGGTAAGT 
1001 ATTGGAGTGC CTCAAATTTG 
1051 TAGTGCCAAG TCCATGGGCT 
1101 ACATTGGCTT CAAGGTCAAC 
1151 CCCGTGAGCT ACCGCATGGC 
1201 ATCCCTTGGC TTGGATCACT 
1251 TCAACCAAGA GACTGCCGAG 
1301 GAGTTGTATG GGTTGAGTGA 
1351 GAATAACTAC AGGCTTCTAA 
1401 ATATGCTGTT CCAGCAGAAC 
1451 GGTAGGCACA TCTTCATGGG 
1501 GGCCATCGAT GATGAAGGCT 
1551 ACGGTCTGGG TTTCCTCTAT 
1601 ACTGCTGGTG GTGAAAATGT 
1651 GAAGAAGATC CCCATCATCA 
1701 AGTTTCTGAG CATGTTGCTG 
1751 GGAGAACCTC TGGACAAGCT 
1801 TCTGGGCAGC CAGGCATCCA 
IB 51 CCCTGGTCTA CAAGGCCATC 
1901 GCCATGAACA ATGCACAGAG 
1951 CTTTTCCATC TATGGTGGAG 
2001 ATTTTGTAGC CCAGAAATAC 
2051 CTGCTTTGAT GGAGCTGCTC 
2101 CCTCATTGCA ATAAGTGAAA 
2151 TTTTTAAGAA GCCACATTCC 
2201 TTGGAGAGGT GCTCCCTAGA 
2251 ATCACTGTAT ATCTTTCTAA 
2301 TATTGGGAAG TCTACTAAAA 



CTGTGGAGCA TGGTTTCTGC ACACCTGGAA 
CAAGAAGGAG CTAAAGATCT TGAAGTAGAC 
TCCCAGGCTG TGGACCACCT GTCGAGATGG 
CCAAACACGG ACCAGGCCAT GAGACCCCGA 
CGAGAGTCAG TCAACCGATT TGGAACTTAT 
TGGCAAAAAG TGGGAAATTC TGAATTTCAA 
GGAAGGCTGC AAAATCCTTG ATCAAGCTGG 
GTTGGTATCC TGGGGTTTAA CTCTGCAGAG 
TGCCATCCTA GCCGGGGGTC TTTGTGTTGG 
CCGAGGCTTG TCAATATGTC ATCACTCATG 
GTTGAGAATG ATCAACAGTT ACAGAAAATC 
CCTAGAGCCC CTAAAAGCGA TCATCCAGTA 
ACAACAACTT GTACTCTTGG GATGATTTCA 
CCTGACACCC AACTGGAGCA GGTCATCGAG 
CGCAGTGCTC ATCTACACTT CAGGGACCAC 
TGCTCAGTCA TGACAACATC ACGTGGATTG 
TTTAAACTGA CAGACAAGCA TGAGACGGTG 
CCATATTGCA GCACAGATGA TGGACATCTG 
CGCTCACATA CTTTGCTCAA GCAGATGCTC 
ACTCTAAAGG AGGTAAAACC TACTGTCTTC 
GGAGAAGATA CATGAGATGG TGAAGAAAAA 
TGAAGAAGAA GGCATTCGTG TGGGCAAGAA 
TCAAAAAAGA TGTTGGGGAA AT AT AA TACT 
TAAGACTCTC GTGTTCAGCA AAGTCAAGAC 
GTCACTCTTT TATCAGTGGG ACTGCGCCCC 
TTCTTTCTAA GCTTGGACAT ACCTATAGGC 
GAGCTCGGGA CCCCACACGA TATCCAACCA 
GCTGTGGCAA GATCTTGACT GGGTGTAAGA 
AAGGATGGCA TTGGGGAGAT CTGCCTCTGG 
CTATCTGGAA AGTGAGACTG AAACTACAGA 
GGCTACACTC TGGGGATCTG GGCCAGCTGG 
GTCACCGGCC ACATCAAAGA AATCCTTATC 
GCCCCCCATT CCTGTTGAGA CCTTGGTTAA 
GTAACGCCAT GTTAGTAGGA GATAAACTGA 
ACGCTGAAGT GTGAGATGAA TCAGATGAGC 
GAACTTCGAG GCCATCAACT TCTGTCGGGG 
CCGTGACTGA GATGGTGAAG CAGCAAGACC 
CAGCAAGGCA TCAATGCTGT GAACCAGGAA 
GATTGAAAAG TGGGTCATCT TGGAGAAGGA 
AGCTAGGTCC AATGATGAAA CTTAAGAGAC 
AAAAAACAAA TTGATCACAT GTACCACTGA 
TCAGCTGTTC TGATGCCTTC AGCAGGAAGA 
TGCTGCTCTA GGTAGAAGCT CTCCCTGCTG 
TCATTGGTCA GTTTCTTGAT TGTTCGTCTG 
AGAACCTGCC ATACGTTTCA AAGCAATAAA 
GGACCTTCAA GTCATGACTC CAGGGAAGCC 
ACTGCCTGAT TTACAAGAAA GACCTGAACT 
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2351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 

2401 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 

2451 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 

2501 TTCAGGGTCC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 



1 MTGTPKTQEG AKDLEVDMNK 

51 MTIPEFFRES VNRFGTYPAL 

101 GLERFHGVGI LGFNSAEWFI 

151 AKVNILLVEN DQQLQKILSI 

201 MELGRSIPDT QLEQVIESQK 

251 AGAVTKDFKL TDKHETWSY 

301 LKGTLVSTLK EVKPTVFIGV 

351 NIGFKVNSKK MLGKYNTPVS 

401 LNQETAEFFL SLDIPIGELY 

451 NMLFQQNKDG IGEICLWGRH 

501 DGLGFLYVTG HIKEILITAG 

551 KFLSMLLTLK CEMNQMSGEP 

601 PLVYKAIQQG INAVNQEAMN 
651 HFVAQKYKKQ IDHMYH 



TEVTPRLWTT CRDGEVLLRL SKHGPGHETP 
ASKNGKKWEI LNFNQYYEAC RKAAKSLIKL 
TAVGAILAGG LCVGIYATNS AEACQYVITH 
PQSSLEPLKA IIQYRLPMKK NNNLYSWDDF 
ANQCAVLIYT SGTTGIPKGV MLSHDNITWI 
LPLSHIAAQM MDIWVPIKIG ALTYFAQADA 
PQIWEKIHEM VKKNSAKSMG LKKKAFVWAR 
YRMAKTLVFS KVKTSLGLDH CHSFISGTAP 
GLSESSGPHT ISNQNNYRLL SCGKILTGCK 
IFMGYLESET ETTEAIDDEG WLHSGDLGQL 
GENVPPIPVE TLVKKKIPII SNAMLVGDKL 
LDKLNFEAIN FCRGLGSQAS TVTEMVKQQD 
NAQRIEKWVI LEKDFSIYGG ELGPMMKLKR 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35kl6, frame 2 

TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds., N = 1, Score «■ 1641, P 
= 8.9e-169 



PIR:E70937 probable fadD15 - Mycobacterium tuberculosis (strain H37RV) , 
N = 2, Score = 532, P « 3.6e-62 

PIR:H64041 long-chain-fatty-acid — CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20) , N = 2, Score - 486, P « 6.5e-59 



>TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length = 634 

HSPs: 



Score = 1641 (246.2 bits), Expect *= 8.9e-169, P = 8.9e-l69 
Identities =• 319/628 (50%), Positives - 440/628 (70%) 



Query: 


38 


LRLSKHGPGHETPMTIPEFFRESVNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSL 


97 






LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 




Sbjct: 


2 


LRIDPSCP — QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 


59 


Query: 


98 


IKLGLERFHGVGILGFNSAEWFITAVGAILAGGLCVGI YATNSAEACQYVITHAKVNILL 


157 






+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N+++ 




Sbjct: 


60 


LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGIYTTSSPEACQYIAYDCCANVIM 


119 


Query: 


158 


VENDQQLQKILSIPQSSLEPLKAIIQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 


216 






V+ +QL+KIL I L LKA++ Y+ P K N+Y+ ++FMELG +P+ L+ +1 




Sbjct: 


120 


VDTQKQLEKILKI-WKQLPHLKAVVIYKEPPPNKMANVYTMEEFMELGNEVPEEALDAII 


178 


Query: 


217 


ESQKANQCAVLIYTSGTTGIPKGVMLSHDNITWIA — GAVTKDFKLTD-KHETVVSYLPL 


273 
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++Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ D + + + E WSYLPL 
Sbjct: 179 DTQQPNQCCVLVYTSGTTGNPKGVMLSQDNITWTARYGSQAGDIRPAEVQQEWVSYLPL 238 

Query: 274 SHIAAQMMDIWVPIKIGALTYFAQADALKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKK 333 

SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 
Sbjct: 239 SHI AAQ I Y DLWTG I QWG AQ VC FA E P DALKG SLVNTLREVEPTSHMGVP RVWEK IMERI QE 298 

Query: 334 NSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHS 393 

+A+S +++K +WA ++ + N G P + R+A LV +KV+ +LG C 

Sbjct: 299 VAAQSGFIRRKMLLWAMSVTLEQNLT-CPGSDLKPFTTRLADYLVLAKVRQALGFAKCQK 357 

Query: 394 FISGTAPLNQETAEFFLSLDIPIGELYGLSESSGPHTISNQNNYRLLSCGKILTGCKNML 453 

G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 
Sbjct: 358 NFYGAAPMMAETQHFFLGLNIRLYAGYGLSETSGPHFMSSPYNYRLYSSGKLVPGCRVKL 417 

Query: 454 FQQNKDGIGEICLWGRHIFMGYLESETETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIK 513 

Q+ +GIGEICLWGR IFMGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 
Sbjct: 418 VNQDAEGIGEICLWGRTIFMGYLNMEDKTCEAIDEEGWLHTGDAGRLDADGFLYITGRLK 477 

Query: 514 EILITAGGENVPPIPVETLVKKKIPIISNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDK 573 

E++ITAGGENVPP+P+E VK ++PIISNAML+GD+ KFLSMLLTLKC ++ + + D 
Sbjct: 478 ELI ITAGGENVPPVPI EEAVKMELPI I SNAMLIGDQRKFLSMLLTLKCTLDPDTSDQTDN 537 

Query: 574 LNFEAINFCRGLGSQASTVTEMVKQQDPLVYKAIQQGINAVNQEAMNNAQRIEKWVILEK 633 

L +A+ FC+ +GS+A+TV+E+++++D VY+AI++GI VN A I+KW ILE+ 

Sbjct: 538 LTEQAVEFCQRVGSRATT VSEI IEKKDEAVYQAI EEGI RRVNMNAAARPYH IQKWAI LER 597 

Query: 634 DFSI YGGELGPMMKLKRHFVAQKYKKQIDHMY 665 

DFSI GGELGP MKLKR V +KYK ID Y 
Sbjct: 598 DFSISGGELGPTMKLKRLTVLEKYKGIIDSFY 629 



Pedant information for DKFZphtes3_35kl6, frame 2 



Report for DKFZphtes3_35kl6 . 2 



[LENGTH] 
EMW) 
tpD 
[HOMOL] 
mRNA for 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-29 
[FUNCAT] 
2e-23 
[FUNCAT] 
palmitylation 
[BLOCKS] 
[SCOP] 
EEC] 
[EC] 
[EC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW J 
[PIRKW) 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM) 



product: "KIAA0631 protein"; Homo sapiens 



666 

74344.97 
8.67 

TREMBL:AB014 531_1 gene: "KIAA0631 fl 
KIAA0631 protein, partial cds. le-176 

i lipid metabolism [H. influenzae, HI0002] 2e-55 
08.10 peroxisomal transport [S. cerevisiae, YEROlSw] 2e-29 
30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

01.06.13 lipid and fatty-acid transport [S. cerevisiae, YEROlSw] 2e-29 



01.06.07 lipid, fatty-acid and sterol utilization 
01.06.01 lipid, fatty-acid and sterol biosynthesis 



[S. cerevisiae, YEROlSw] 
[S. cerevisiae, YMR24 6w] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YMR246w] 2e-23 

BL00455 

dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

1.13.12.7 Photinus-luciferin 4-monooxygenase (ATP-hydrolysing) 9e-17 
6.2.1.3 Long-chain- fatty-acid — CoA ligase 4e-34 

5.1.1.11 Phenylalanine raceraase (ATP-hydrolysing) 6e-08 

6.2.1.12 4-Coumarate — CoA ligase 8e-18 
duplication 6e-07 
phosphopantetheine 3e-12 
multifunctional enzyme 3e-06 

ligase 6e-08 

acid-thiol ligase 4e-34 

transmembrane protein 5e-22 

monooxygenase 9e-17 

hydrolase 4e-34 

peroxisome 9e-15 

antibiotic biosynthesis 3e-12 

isomerase 6e-08 

flavonoid biosynthesis le-17 

magnesium 9e-15 

ATP 5e-22 

oxidoreductase 9e-17 
liver 2e-31 

alpha-aminoadipyl-cysteinyl-valine synthetase 3e-07 
human long-chain-fatty-acid — CoA ligase 4e-34 
gramicidin S synthetase I 6e-08 
peptide synthetase ppsE 7e-06 

gramicidin S synthetase I repeat homology 3e-12 
peptide synthetase ppsD 2e-07 
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[SUPFAM] 


probable acyl-CoA liqase medium chain 2e-09 


[SUPFAM] 


acetate — CoA ligase 


8e-10 


[SUPFAM] 


acetate — CoA ligase 


homology 4e-54 


[SUPFAM] 


surfactin synthetase 


3e-12 


[SUPFAM] 


4-coumarate — CoA ligase 8e-18 


[SUPFAM] 


short-chain alcohol dehydrogenase homology 8e-07 


[SUPFAM] 


acyl carrier protein homology 2e-29 


[PROSITE] 


MYRISTYL 12 




[PROSITE] 


AMP BINDING 1 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


9 


[PROSITE] 


TYR PHOSPHO SITE 


3 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASN_GLYCOSYLATION 


2 


[PFAM] 


AMP-binding enzymes 




[KW] 


Irregular 




[KW] 


3D 




[KW] 


LOW_COMPLEXITY 1 


.80 % 



SEQ MT GT P KTQEG AKDL E V DMN KT E VT PRLWTTCRDGE VLLRL S KHG PG H ETPMTIPEF FRES 

SEG 

llci- 

SEQ VNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSLIKLGLERFHGVGILGFNSAEWFI 

SEG 

llci- 

SEQ T A VGA I LAGGLC VG I Y ATN S AE ACQYV I THAKVN I L L VEN DQQLQK ILSIPQSSLEPLKA 

SEG 

llci- 

SEQ IIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIYTSGTTGIPKGV 

SEG 

llci- 

SEQ MLSHDNITWIAGAVTKDFKLTDKHETVVSYLPLSHIAAQMMDIWVPIKIGALTYFAQADA 

SEG 

llci- 



SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG 

llci- 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLKQETAEFFLSLDIPIGELY 

SEG 

llci- TTTTCEEETTTTCCCHHHHHHHHHHCCCCBCEE 

SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHIFMGYLESET 

SEG 

llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEETTTTCCEETTTHH 

SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPII 

SEG xxxxxxxxxxxx 

llci- HHHHHBTTTTCEEEEEEEEETTTTCEEE ECEEETTEEECHHHHHHHHHHT-TTE 

SEQ SNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDKLNFEAINFCRGLGSQASTVTEMVKQQD 

SEG 

llci- EEEEEEE 

SEQ PLVYKAIQQGINAVNQEAMNNAQRI EKWVILEKDFSIYGGELGPMMKLKRHFVAQKYKKQ 

SEG 

llci- 

SEQ IDHMYH 

SEG 

llci- 



Prosite for DKFZphtes3_35kl6.2 



PS00001 


19->23 


PS00001 


246->250 


PS0Q004 


332->336 


PS00005 


4->7 


PS00005 


24->27 


PS00005 


30->33 


PS00005 


218->221 


PS00005 


261->264 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
PKC'PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


308->3ll 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


335->338 


PKC'PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


358->361 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


370->373 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


558->561 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00006 


30->34 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


52->56 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


173->177 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


196->200 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


206->210 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


210->214 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


308->312 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


478->482 


CK2 PHOSPHO"SITE 


PDOC00006 


PS00006 


591->595 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


659->666 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


658->666 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


597->605 


TYR PHOSPHO~SITE 


PDOC00007 


PS00008 


3->9 


MYRISTYL 


PDOC0000B 

t yvw www 


PS00008 


65->71 


MYRISTYL 




PDOC00008 


PS00008 


124->130 


MYRISTYL 




PDOC00008 


PS00008 


130->136 


MYRISTYL 




PDOC00008 


PS00008 


134->140 


MYRISTYL 




PDOC00008 


PS00008 


235->241 


MYRISTYL 




PDOC00008 


PS00008 


239->245 


MYRISTYL 




PDOC00008 


PS00008 


3O3->309 


MYRISTYL 




PDOC00008 


PS00008 


387->393 


MYRISTYL 




PDOC00008 


PS00008 


421->427 


MYRISTYL 




PDOC00008 


PS00008 


498->504 


MYRISTYL 




PDOC00008 


PS00008 


586->592 


MYRISTYL 




PDOC00008 


PS00009 


74->78 


AMI DAT I ON 




PDOC00009 


PS00455 


227->239 


AMP BINDING 




PDOC00427 



Pfam for DKFZphtes3_35kl6.2 



HMM_NAME AMP-binding enzymes 

HMM *TYRELNERANRLARHLRsekGIrPGDiVgIMMDRSMWMIVaMLGIWKAG 
+ + +E +A L+ +G VGI+ +S + ++ G + AG 

Query 82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWFITAVGAILAG 129 

HMM GAYVPIDPeYPdERIqYMLEDSGArLLITQrh HmqRI PdemwwvdH 

G +V I +E QY++ ++ + +L+++ + + IP++++ + 

Query 130 GLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLK 179 

HMM IiviDWe WddlWWHedeeNpqpWvdPeDLAYIIY 

+I++ + + ++++ + E ++ ++++ A +IY 

Query 180 AIIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIY 229 

HMM TSGTTGKPKGVMIEHrNIvNycqWMnWRYgMteeDDRILWFtSDpYWFDa 
TSGTTG PKGVM++H NI+ + +++ +T+ +++ + + ++ A 
Query 230 TSGTTGIPKGVMLSHDNITWIAGAVTKDFKLTDKHETWSYLP-LSHIAA 278 

HMM SVWDMFWpLLnGaTLYIpPeEtRrDPerWWqYIqRHglTWWylTPSMFRM 

+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

Query 279 QMMDIWVPIKIGALTYFAQADAL--KGTLVSTLKEVKPTVFIGVPQIWEK 326 

HMM LMpd 

+ + 

Query 327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 376 

HMM psLRhVMFgGEpLsPehWdWWRJcrf gf kgRI INMYWPT 

++ + +++G PL++E+++ ++ + ++I Y+ + 
Query 377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD— IPIGELYGLS 423 

HMM ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQIQPiGViGE 
E++ T+ + + R +++G+ + + + + +N G IGE 

Query 424 ESSGPHTISNQNN--Y RLLSCGKILTGCKNMLFQQN KDG-IGE 463 

HMM LY I g GW PG V ARG YWNR P E LT EER Fi pN P FW PGE Y R r GWN r RM Y RT G DL AR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
Query 4 64 ICLWG-RHIFMGYLESETETTEAIDDEGW LHSGDLGQ 499 

HMM WIPDGnlEYLGRID . DQVKIRGYRIELGEIEhqLr . qHPglqEAVV* 

+ G+++ G I + G+++ + +E+ + ++P 1+ A 
Query 500 LDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPIISNAML 545 
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DKFZphtes3 35k24 



group: transmembrane protein 

DKFZphtes3_35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 
The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 

unknown ; 

membrane regions: 5 

Summary DKFZphtes3_35k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and Celegans, specific for 
mammalians? 

unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2706 bp 

Poly A stretch at pos. 2696, polyadenylation signal at pos. 2675 

1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 
51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 

101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 

151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 

201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 

251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 

301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 

351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 

401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 

451 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 

501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 

551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 

601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 

651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 

701 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 

751 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 

801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 

851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 

901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 

951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 
1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 
1051 GAATATGGGC AATATATCGG CCCGGGGCAG AAGATATATA CAGTGAAAGA 
1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 
1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAACATATGT TGAGGGAGAC 
1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 
1251 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 
1301 GGTTCTTTGG ACGATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 
1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 
1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 
1451 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 
1551 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA GATAACACAA AAAGCAACCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
1701 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
1751 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2351 AACAAAGGTT AAGAGACACA GTTGGGCGAA CTCTCAAATT TATTGGCATT 

2401 TACACAAAGT CCCAGACAAC CAAGGAACTG AAGTTTTCAT CATATGAGAG 

2451 CAGCACATCC CACCATTTAC AATATTCGTA TATCTTTCTG CAAATATGGC 

2501 TCTGGATAGT GAAAATTGAA AAACATATGC CAACCCTGAG CAAGGGAACT 

2551 CCTCAAAAAA TCATGCAGCG GAACCTTGTC AGGTAGAGAA GCCGTGCATG 

2601 AAAGAATTTG TTTAATGTCT TGTTTTGCGT ATGTGTTTTT TGTTTTTGTT 

2651 TTTTAAGAAC TAAATATTGC ACATTAATAA ATAAGAATTA TACAGCAAAA 

2701 AAAAAA 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



Peptide information for frame 1 



ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 



1 MGKDFRYYFQ HPWSRMIVAY LVIFFNFLIF AEDPVSHSQT EANVIWGNC 
51 FSFVTNKYPR GVGWRILKVL LWLLAILTGL IAGKFLFHQR LFGQLLRLKM 
101 FREDHGSWMT MFFSTILFLF IFSHIYNTIL LMDGNMGAYI ITDYMGIRNE 
151 SFMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SVWLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 
251 LLIVMQDWEF PHFKGDVDVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 
351 DLNRTKLSWE WRSNHTNPRT NKTYVEGDMF LHSRFIGASL DVKCLAFVPS 
401 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLNESTSATE 
501 ADQDPTTSKS TPTN 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35k24, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35k24, frame 1 

Report for DKFZphtes3_35k24 . 1 



[LENGTH] 


514 




[MWJ 


60185.03 




tpU 


8.67 




[PROSITE] 


MYRI STYL 5 




[PROSITEJ 


CAMP PHOSPHO SITE 


1 


[PROSITEJ 


CK2 PHOSPHO SITE 


8 


[PROSITEJ 


TYR~PHOSPHO SITE 


1 


[PROSITE1 


PKC PHOSPHORS I TE 


7 


[PROSITEJ 


ASN GLYCOS Y LAT I ON 


6 


[KWJ 


SIGNAL PEPTIDE 32 




[KWJ 


TRANSMEMBRANE 5 




[KWJ 


LOW COMPLEXITY 15. 


.37 



SEQ MGKDFRY Y FQH PWSRMIVAYLVIFFNFLIFAEDPVSHSQTEAN VI VVGNC FSFVTNKYPR 

SEG 

PRD cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRILKVLLWLLAILTGLIAGKFLFHQRLFGQLLRLKMFREDHGSWMTMFFSTILFLF 

SEG XXXXXXXXXXXXXXXXX xxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SCO IFSHIYNTILLMDGNMGAYIITDYMGIRNESFMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

SEG xxx 

PRD hhhhhhhhhhccccccceeeeecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 

MEM MMMMMMMMMMMM 

SEQ YPDWGKSARAFWKKGNVRITLFWTVLFTLTSVVVLVITTDWISWDKLNRGFLPSDEVSRA 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FLASFILVFDLLIVMQDWEFPHFMGDVDVNLPGLHTPHMQFKIPFFQKIFKEEYRIHITG 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ KWFNYGIIFLVLILDLNMWKNQIFYKPHEYGQYIGPGQKIYTVKDSESLKDLNRTKLSWE 

SEG 

PRD ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 

MEM 

SEQ WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

SEG xxxxxxxxxxxxxx . . . 

PRD hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ EPRMENQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

SEG 

PRD cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

MEM 

SEQ SHLTSENLSSQLNESTSATEADQDPTTSKSTPTN 

SEG 

PRD cccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_35k24 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



149->153 
353->357 
364->368 
371->375 
487->491 
493->497 
435->439 

55->58 
187->190 
299->302 
342->345 
348->351 
370->373 
507->510 

38->42 
342->346 
348->352 
373->377 
438->442 
456->460 
497->501 
499->503 
326->334 

48->54 

79->85 
106->112 
134->140 
159->165 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLATI ON 

ASN_GL YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

CK2~PHOS PHO~S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35k24 . 1) 
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DKF2phtes3_35nl2 
group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP, ATP 
carrier T (ANT) proteins. 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP /ATP translocator, or adenine nucleotide translocator (ANT) , a protein most 
abundant in mitochondria. In its functional state, it is a homodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria . 

strong similarity to ADP/ATP carrier proteins 

EST hits to mouse and drosophila 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1803 bp 

Poly A stretch at pos . 1793, polyadenylation signal at pos. 1772 

1 AGCGTCCCAA GAGCCACTTT CTCGCCAGTA CGATGCTGCA GCGGTTTTCC 

51 GGTTTTCCGC TTCCCTTCAT CGTAGCTCCC GTACTCATTT TTAGCCACTG 

101 CTGCCGGTTT TTATATCCTT CTCCATCATG CATCGTGAGC CTGCGAAAAA 

151 GAAGGCAGAA AAGCGGCTGT TTGACGCCTC ATCCTTCGGG AAGGACCTTC 

201 TGGCCGGCGG AGTCGCGGCA GCTGTGTCCA AGACAGCGGT GGCGCCCATC 

251 GAGCGGGTGA AGCTGCTGCT GCAGGTGCAG GCGTCGTCGA AGCAGATCAG 

301 CCCCGAGGCG CGGTACAAAG GCATGGTGGA CTGCCTGGTG CGGATTCCTC 

351 GCGAGCAGGG TTTCTTCAGT TTTTGGCGTG GCAATTTGGC AAATGTTATT 

401 CGGTATTTTC CAACACAAGC TCTAAACTTT GCTTTTAAGG ACAAATACAA 

451 GCAGCTATTC ATGTCTGGAG TTAATAAAGA AAAACAGTTC TGGAGGTGGT 

501 TTTTGGCAAA CCTGGCTTCT GGTGGAGCTG CTGGGGCAAC ATCCTTATGT 

551 GTAGTATATC CTCTAGATTT TGCCCGAACC CGATTAGGTG TCGATATTGG 

601 AAAAGGTCCT GAGGAGCGAC AATTCAAGGG TTTAGGTGAC TGTATTATGA 

651 AAATAGCAAA ATCAGATGGA ATTGCTGGTT TATACCAAGG GTTTGGTGTT 

701 TCAGTACAGG GCATCATTGT GTACCGAGCC TCTTATTTTG GAGCTTATGA 

751 CACAGTTAAG GGTTTATTAC CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 

801 CCTTTTTCAT TGCTCAAGTT GTGACTACAT GCTCTGGAAT ACTTTCTTAT 

851 CCCTTTGACA CAGTTAGAAG ACGTATGATG ATGCAGAGTG GTGAGGCTAA 

901 ACGGCAATAT AAAGGAACCT TAGACTGCTT TGTGAAGATA TACCAACATG 

951 AAGGAATCAG TTCCTTTTTT CGTGGCGCCT TCTCCAATGT TCTTCGCGGT 

1001 ACAGGGGGTG CTTTGGTGTT GGTATTATAT GATAAAATTA AAGAATTCTT 

1051 TCATATTGAT ATTGGTGGTA GGTAATCGGG AGAGTAAATT AAGAAATAAC 

1101 ATGGATTTAA CTTGTTAAAC ATACAAATTA CATAGCTGCC ATTTGCATAC 

1151 ATTTTGATAG TGTTATTGTC TGTATTTTGT TAAAGTGCTA GTTCTGCAAT 

1201 AAAGCATACA TTTTTTCAAG AATTTAAATA CTAAAAATCA GATAAATGTG 

1251 GATTTTCCTC CCACTTAGAC TCAAACACAT TTTAGTGTGA TATTTCATTT 

1301 ATTATAGGTA GTATATTTTA ATTTGTTAGT TTAAAATTCT TTTTATGATT 

1351 AAAAATTAAT CATATAATCC TAGATTAATG CTGAAATCTA GGAAATGAAA 

1401 GTAGCGTCTT TTAAATTGCT ATTCATTTAA TATACCTGTT TTCCCATCTT 

1451 TTGAAGTCAT ATGGTATGAC ATATTTCTTA AAAGCTTATC AATAGATGTC 

1501 ATCATATGTG TAGGCAGAAA TAAGCTTTGT TCTATATCTC TTCTAAGACA 

1551 GTTGTTATTA CTGTGTATAA TATTTACAGT ATCAGCCTTT GATTATAGAT 

1601 GTGATCATTT AAAATTTGAT AATGACTTTA GTGACATTAT AAAACTGAAA 

1651 CTGGAAAATA AAATGGCTTA TCTGCTGATG TTTATCTTTA AAATAAATAA 

1701 AATCTTGCTA GTGTGAATAT ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 

1751 TTAGTTTGTA TATTTTGTTG ACAATAAAGG AAGCTTAACT GTTAAAAAAA 
1801 AAA 



BLAST Results 



No BLAST result 



Medline entries 



96269608: 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 



ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MlTOCH_CARRIER (40-50) 

MITOCH_CARRIER (145-155) 

MI TOCH_CARRI ER (242-252) 



1 MHREPAKKKA EKRLFDASSF GKDLLAGGVA AAVSKTAVAP IERVKLLLQV 
51 QASSKQISPE ARYKGMVDCL VRIPREQGFF SFWRGNLANV IRYFPTQALN 
101 FAFKDKYKQL FMSGVNKEKQ FWRWFLANLA SGGAAGATSL CVVYPLDFAR 
151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 
201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ VVTTCSGILS YPFDTVRRRM 
251 MMQSGEAKRQ YKGTLDCFVK IYQHEGISSF FRGAFSNVLR GTGGALVLVL 
301 YDKIKEFFHI DIGGR 

B LAS TP hits 

No BLAST P hits available 

Alert BLASTP hits for DKF2phtes3_35nl2, frame 2 

PIR:S37210 ADP, ATP carrier protein Tl - mouse, N » 1, Score = 1127, p 
2.7e-114 



PIR:A44778 ADP, ATP carrier protein Tl - human, N = 1, Score = 1125, P = 
4.4e-114 



TREMBL : DMADPATPT_2 product: "ADP/ATP translocase"; Drosophila 
melanogaster gene encoding ADP/ATP translocase, N - 1, Score » 1124, P 
= 5.6e-114 



PIR:XWBO ADP, ATP carrier protein Tl - bovine, N - 1, Score » 1121, P - 
1.2e-113 



>PIR:S37210 ADP, ATP carrier protein Tl - mouse 
Length = 298 

HSPs: 



Score - 1127 (169.1 bits), Expect = 2.7e-114, P = 2.7e-114 
Identities = 214/293 (73%), Positives = 248/293 (84%) 



Query: 


17 


AS S FGK DLLAGG VAAAVS KTAVA P I ERVKLLLQVQAS SKQIS PEARY KGMVDCL VRI PRE 


76 






A SF KD LAGG+AAAVSKTAVAPIERVKLLLQVQ +SKQIS E +YKG++DC+VRIP+E 




Sbjct: 


5 


ALSFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCVVRIPKE 


64 


Query: 


77 


QGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQFWRWFLANLASGGAAG 


136 






QGF SFWRGNLANV I RYFPTQALNFAFKDKYKQ+F+ GV++ KQFWR+F NLASGGAAG 




Sbjct: 


65 


QGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQIFLGGVDRHKQFWRYFAGNLASGGAAG 


124 


Query: 


137 


ATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSDGIAGLYQGFGVSVQGI 


196 






ATSLC VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 




Sbjct: 


125 


ATSLCFVYPLDFARTRLAADVGKGSSQREFNGLGDCLTKIFKSDGLKGLYQGFSVSVQGI 


184 


Query: 


197 


IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQVVTTCSGILSYPFDTVRRRMMMQSGE 


256 






I+YRA+YFG YDT KG+LP PK +VS+ IAQ VT +G++SYPFDTVRRRMMMQSG 




Sbjct: 


185 


1 1 YRAAYFGVYDTAKGMLPDPKNVHI IVSWMIAQSVTAVAGLVSYPFDTVRRRMMMQSGR 


244 


Query: 


257 


— AKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVLYDKIKEF 307 








A Y GTLDC+ KI + EG ++FF+GA+SNVLRG GGA VLVLYD+IK++ 




Sbjct: 


245 


KGADIMYTGTLDCWRKIAKDEGANAFFKGAWSNVLRGMGGAFVLVLYDEIKKY 297 





Pedant information for DKFZphtes3_35nl2, frame 2 



Report for DKFZphtes3_35nl2 . 2 



[LENGTH] 315 
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[MWJ 

[pi] 

[HOMOL] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCATJ 

[ FUNCAT ) 

cerevisiae, 

[FUNCAT] 

t FUNCAT] 

t FUNCAT J 

[FUNCAT] 

le-13 

[FUNCATJ 

[FUNCAT] 

6e-12 

[FUNCATJ 

[FUNCATJ 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW J 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PFAMJ 

[KWJ 

[KW] 



35022.03 
9.91 

PIR:S37210 ADP, ATP carrier protein Tl - mouse le-115 

07.16 purine and pyrimidine transporters [S. cerevisiae, YBL030c] 2e-72 

08.04 mitochondrial transport [S. cerevisiae, YBL030c] 2e-72 

30.16 mitochondrial organization [S. cerevisiae, YBL030cl 2e-72 

01.03.19 nucleotide transport [S. cerevisiae, YBL030c] 2e-72 

01.07.10 transport of vitamins, cof actors, and prosthetic groups [S. 
YIL006w] 2e-14 

07.99 other transport facilitators [S. cerevisiae, YIL006w] 2e-14 
01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 5e-14 

07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 5e-14 
07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 



02.13 respiration [S. cerevisiae, YBRl92w] 4e-13 

01.05.04 regulation of carbohydrate utilization [S. 



cerevisiae, YJR095w] 



13.04 homeostasis of other ions [S. cerevisiae, YLR348cJ 4e-10 

01.04.07 phosphate transport [S. cerevisiae, YLR348c] 4e-10 
01.01.07 amino-acid transport [S. cerevisiae, YORl30c] le-06 

07.10 amino-acid transporters [S. cerevisiae, YORl30c] le-06 

99 unclassified proteins [S. cerevisiae, YPRl28cJ 2e-06 

04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 2e-06 
BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication le-115 
phosphate transport 2e-09 
heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-08 
acetylated amino end le-115 
adipose tissue 5e-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR045w 3e-07 
ADP, ATP carrier protein le-115 
Btl protein 2e-14 

ADP, ATP carrier protein repeat homology le-115 

probable carrier protein YPR021c le-12 

MITOCH_CARRIER 3 

Mitochondrial carrier proteins 

TRANSMEMBRANE 2 

LOW COMPLEXITY 4.7 6 % 



SEQ MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPE 

SEG 

PRD ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ARYKGMVDCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQ 

SEG 

PRD hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 

MEM 

SEQ FWRWFLANLASGGAAGATSLCWYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

SEG xxxxxxxxxxxxxxx 

PRD eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

MEM 

SEQ GIAGLYQGFGVSVQGI IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQWTTCSGILS 

SEG 

PRD cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhhhheeeeec 

MEM .... MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ Y P F DT VRRRMMMQS G EAKRQY KG T L DC FVK I YQH EGI S S F FRG A FS N VL RGT GG A LV L VL 

SEG 

PRD cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 

MEM MMMMMMMMMMM 

SEQ YDKIKEFFHIDIGGR 

SEG 

PRD hhhhhhheeeecccc 

MEM 
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Prosite for DKF2phtes3_35nl2 .2 



PS00215 40->50 MITOCH_CARRIER PDOC00189 

PSO0215 145->155 MITOCH_CARRIER PDOC00189 
PS00215 242->252 MITOCH CARRIER PDOC00189 



Pfam for DKFZphtes3_35nl2 .2 
HMM_NAME Mitochondrial carrier proteins 

HMM *pFwkdFLAGGIAGmMeHTvMFPIDtIKTRMQlQgEMpM. . ahpRYkGMI 

+F+KD+LAGG+A+++++T+++PI+++K+++Q+Q +++ RYKGM+ 
Query 19 S FG K DLLAGG VAAA V S KT A VAP I E RV K LLLQVQAS S KQ I S PEARYKGMV 67 

HMM dCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFYEFMKeMFiDyfge 
DC+ +I++++G++++WRG++ANVIRY+P++A++F+F++ +K +F + +++ 
Query 68 DCLVRI PREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 117 

HMM ddnyWmWFwmnYMaGsmAGEwisvIitYPMWvVKTRLQaDqkHphsQp. R 

++W+WF+ N+++G++AG ++S+ ++YP+++++TRL D +++++ R 
Query 118 EKQFWRWFLANLASGGAAG-ATSLCVVYPLDFARTRLGVD— IGKGPEER 164 

HMM hYNGvWNcWrklYReEGgFkGLYRGWtPTWMRMIPYqndYFfvYEtLKeW 
+++G+ +C KI +++G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
Query 165 QFKGLGDCIMKIAKSDG-IAGLYQGFGVSVQGIIVYRASYFGAYDTVKGL 213 

HMM lynYtgYnPgprelCMddsPwWhWilgWmlAGMiaWivSYPfDVVRTRMM 
L +++ + ++++++I++ ++ ++++I+SYPFD+VR+RMM 

Query 214 LP KPK — KTPFLVSFFIAQWT-TCSGILSYPFDTVRRRMM 251 

HMM Mdsm. edhkYqSmlDCWMqlYKnEGFkGFWKGFWPRIMRiMPWtAIMFml 

M+S+ ++++Y+++LDC+++IY++EG+ +F++G+ +++R+ ++A+++++ 
Query 252 MQS G E AKRQY KGTLDCFVKI YQH EG I S S FFRG A FS N V L RGT - GG AL V L V L 300 

HMM YEqMKwFL* 
Y+ +K+F+ 

Query 301 YDKIKEFF 308 
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DKFZphtes3_35n24 



group: testes derived 

DKFZphtes3_35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite Ig (Immunoglubulin) -MHC pattern. This pattern represents a 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (Ilg domaini) . Thus, the novel protein is a new member of the Ig-superfamily . 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos. 1560 



1 CGATCGTCAC GTGACGCCGG 

51 TTAGAGACCA GCACTGCTGG 

101 AGTCCCCAAG GGGCGCAGAC 

151 AGCGGGTGTG CGCGGCCTGC 

201 CAGAAGGCTG ACTGGGACAG 

251 TCCACTGCGC ACTTCCATGC 

301 ATGGCCTGCA GCAGCTGCAG 

351 TACACCATAG CCCAGAAATA 

401 ACCAGCAGCT TTGCAGTCCC 

451 GCTCCGTAGA GCTTGTGCCT 

501 GGTCTGGGCC GAATCGTTCA 

551 GACAGTCCTC AAATCAACTG 

601 ATCGGAATCT GGGACTTCTC 

651 CGTTATCATC TGGCCAATGA 

701 AGAGGACATT AGGACTTCAG 

751 ATGACCTTAA AAAGTTGGAC 

801 GAGATCTGGC ATGCATATTT 

851 TCACATCCAA CAAATGGATT 

901 GCTTGGATGA AGCCCAAGAA 

951 TTGAACATTC GAGAATCTAC 

1001 TGTTCTGAAG ATCCTGGTCA 

1051 AGGCACAGGA ATATGGCATG 

1101 CTTGATGTCC ATGAGCAAAG 

1151 AACTGAAGAC CATCCCATTA 

1201 TTATTCCAGG GGCTACTGAA 

1251 CTTTGAGGTA CTGTAGACTG 

1301 GCACACATAG CTGTTATTTT 

1351 AGCTTTAGGC ATAGAAATCA 

1401 CTTGATTTAT CATGACTTTG 

1451 AATATGGTAT TTGTAATTAA 

1501 CTTCCAACGA TGCATGTTTC 

1551 GGGGGTAGGG AATAAAGCTA 



GGTTCAGCGT ATCCTTGCTG GGCAACCGTC 
CTGCACCATG AATGTGATCT ACCCACTGGC 
TCTGCTGTGA GGTGTGCGAA GCCCCAGCCG 
ACAGTCACTT ATTACTGTGG GGTGGTACAT 
CATCCATGAG AAAATATGTC AGCTCTTGAT 
CCTTCTACAA TTCAGAGGAA GAACGGCAGC 
CAGCGGCAGA AGTATTTGAT TGAATTCTGC 
CCTCTTTGAA GGGAAACACG AAGATGCTGT 
TTCGCTTCCG TGTGAAGCTG TATGGCCTGA 
GCTTACCCGC TGTTGGCCGA GGCCAGCCTT 
GGCTGAAGAA TATCTATTCC AAGCCCAGTG 
ACTGTAGTAA TGCCACCCAC TCTTTACTGC 
TATATAGCTA AGAAAAACTA TGAAGAGGCC 
TATTTATTTT GCCAGTTGTG CATTTGGAAC 
GAGGCTACTT CCACCTGGCT AATATATTCT 
CTGGCAGACA CATTGTACAC CAAGGTCTCT 
GAACAATCAC TATCAAGTCC TCTCACAGGC 
TACTGGGCAA ACTATTTGAG AATGACACTG 
GCAGAAGCCA TTCGCATCCT GACTTCAATC 
ATCTGACAAA GCCCCCCAAA AAACCATCTT 
TGCTTTACTA CCTGATGATG AATTCTTCAA 
AGGGCCCTCA GTCTAGCCAA AGAACAACAG 
CACCATTCAA GAGTTATTAA GTCTCATTTC 
CTTAGTGACC CATGAGCTCT GCATCAAGGG 
GATCTAATAT ATTCCAGCCT TGCACAACTG 
CTGAAGTTTC CACCCTCTTC CCCTGGGATT 
TTTCTTACAC AGCATATTAA GGGAATATAA 
CTAAAAACTG TGTTTGTCAT GACCTTTGTA 
TATGACTGAG TAATATGTAG TCAGATCACT 
ACTACAAATA GTTTGTCATT TCCCAGAAGT 
ATACACTTTT GCTAAAGGAG GGGTAAAGGA 
TATTGGAACA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosite motifs: IG_MHC (35-42) 



1 MNVIYPLAVP KGRRLCCEVC EAPAERVCAA CTVTYYCGW HQKADWDSIH 
51 EKICQLLIPL RTSMPFYNSE EERQHGLQQL QQRQKYLIEF CYTIAQKYLF 
101 EGKHEDAVPA ALQSLRFRVK LYGLSSVELV PAYPLLAEAS LGLGRIVQAE 
151 EYLFQAQWTV LKSTDCSNAT HSLLHRNLGL LYIAKKNYEE ARYHLANDIY 
201 FASCAFGTED IRTSGGYFHL ANIFYDLKKL DLADTLYTKV SEIWHAYLNN 
251 HYQVLSQAHI QQMDLLGKLF ENDTGLDEAQ EAEAIRILTS ILNIRESTSD 
301 KAPQKTI FVL KILVMLYYLM MNSSKAQEYG MRALSLAKEQ QLDVHEQSTT 
351 QELLSLISTE DHPIT 

BLAST P hits 

No BLAST P hits available 

Alert B LAS TP hits for DKFZphtes3_35n24 , frame 3 
No Alert blastp hits found 

Pedant information for DKFZphtes3_35n24, frame 3 



Report for DKFZphtes3_35n24 . 3 



(LENGTH] 365 

[MW] 41768.24 

[pi] 5.82 

[BLOCKS] BL00273 Heat-stable enterotoxins proteins 

[PROSITE] MYRISTYL 1 

[PROSITE] IG_MHC 1 

[PROSITE] AMI DAT ION 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] TYR_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 3 

(PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.11 % 



SEQ MNVIYPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGVVHQKADWDSIHEKICQLLIPL 

SEG 

PRD ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

SEQ RTSMPFYNSEEERQHGLQQLQQRQKYLIEFCYTIAQKYLFEGKHEDAVPAALQSLRFRVK 

SEG xxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

SEQ LYGLSSVELVPAYPLLAEASLGLGRIVQAEEYLFQAQWTVLKSTDCSNATHSLLHRNLGL 

SEG 

PRD hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ LYIAKKNYEEARYHLANDIYFASCAFGTEDIRTSGGYFHLANIFYDLKKLDLADTLYTKV 

SEG 

PRD eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEQ SEIWHAYLNNHYQVLSQAHIQQMDLLGKLFENDTGLDEAQEAEAIRILTSILNI RESTS D 

SEG 

PRD hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

SEQ KAPQKTI FVLKILVMLYYLMMNSSKAQEYGMRALSLAKEQQLDVHEQST I QELLSLISTE 

SEG 

PRD ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ DHPIT 

SEG 

PRD ccccc 



Prosite for DKFZphtes3_35n24 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 



168->172 
272->276 

322- >326 
114->117 
299->302 

323- >326 



AS N_G L YCO S Y L AT I ON 
AS N_G L YCOS Y LAT I ON 
ASN_GLYCOSYLATION 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00007 
PS00008 
PS00009 
PS00290 



48->52 
69->73 
125->129 

274- >278 
297->301 
349->353 
3S8->362 

85->93 
186->194 
186->194 
185->194 

275- >281 
U->15 
35->42 



ck2_ph0spho 

ck2_phospho~ 

ck2 phospho" 

ck2~phospho~ 

ck2_ph0spho~ 

ck2_phospho" 

ck2 phospho" 

tyr"phospho" 

tyr_phospho" 

tyr phospho" 

tyr"phospho~ 

myristyl 

ami dat ion 

IG MHC 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00O06 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00009 
PDOC00262 



(No Pfam data available for DKFZphtes3_35n24 . 3) 
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DKFZphtes3_35n9 



group: metabolism 

DKFZphftes3_35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase (EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 458-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 



carboxylesterase, splice variant 

5' extension of mRNA and N-terminal elongation of protein (64 aa), 
missing exon! aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2888 bp 

Poly A stretch at pos . 2878, no polyadenylation signal found 



1 CTCGGCCTGA GGTGCGAGAG 
51 CCCGGGAACA TGATGGTCGC 
101 GCGGCGCTGG GTCGTGCGAG 
151 ACCAGGTCTC AGGGGGCACT 
201 TTCCTCCTGG AAGTCAGGCT 
251 GTCGCCTCTG GCCTGGCAGG 
301 GCGCCTGCCT CCCCTGCTGC 
351 AGCTTGGACC GGCAGGGGCT 
401 CATGGCCGTG GTCTCCGCGG 
451 AAGGGGGGCG CTCCTTAAGA 
501 AATGGCGTGA CTGAGTAGGG 
551 CTGTAAGGAG ATGAGGGGCC 
601 AAAGCAAGGA GGAACTTCCA 
651 TTTGCTCAAG CGGTTCCTTC 
701 TCTGGGGGAT CCTGAACGTG 
751 CCAATTCTAG TTTATTGCCC 
801 GGCCTGTGGA CAAGGACAGG 
851 GGGCTGCAGG AATGGCACAG 
901 CAAAGGAAAG TGGCCGTGCC 
951 CCTATGACTG CTCAGTCCCG 
1001 AAGCCAGCGC ACCCCGCTGA 
1051 GCAAGGCACT GATCCACTGC 
1101 CAGCAGCGTG TCCGCCGGCA 
1151 GCTGCACAGA CTTCGTGCGC 
1201 TGCTTCTTGT CCGGGGCCAG 
1251 ACACACACGG GGCAGGTGCT 
1301 TGCCGGGGTC CAAACCTTCC 
1351 GTCCGCTGCG ATTTGCACCC 
1401 AGGGATGGAA CCACCCATCC 
1451 GGAGTCAGAG TTTCTTAGCC 
1501 TGTCTGAGGA CTGCCTGTAC 
1551 GAAGGCTCTA ACCTGCCGGT 
1601 TTTTGGCATG GCTTCCTTGT 
1651 ACGTGGTGGT GGTCATCATC 
1701 AGCACTGGAG ACAAGCACGC 
1751 GGCTGCACTA CGCTGGGTCC 
1801 CTGACCGTGT CACCATTTTT 
1851 TCGCTTGTTG TGTCCCCCAT 
1901 GGAGAGTGGC GTGGCCCTCC 
1951 TCATCTCCAC GGTGGTGGCC 
2001 GAGGCCCTGG TGGGCTGCCT 
2051 AATTAACAAG CCTTTCAAGA 
2101 TGCCCAGGCA CCCCCAGGAG 
2151 CCTAGCATTG TTGGTGTCAA 
2201 GGTCATGAGG ATCTATGATA 
2251 AGGCTGCTCT GCAGAAAATG 
2301 GGTGACCTGC TGAGGGAGGA 
2351 CCTCCAAGCG CAGTTCCAGG 
2401 CTGCACTCCA AGTAGCACAT 
2451 TACGAGTTCC AGCATCAGCC 
2501 CATGAAGGCA GACCATGTTA 
2551 GGAAGATGAT GAAGTACTGG 
2601 GGCGAGGGTC TGCCACACTG 



AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 
AAAGGCGGTC GGAGGTAATC CCCACACCGC 
GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
GATCCTAGTG TCTCGGGACC TCCCGGTGAC 
ACCATAGGCC CGGGAGTACG GCGTCCCCAC 
CGTGAAATGT TTGTCAAGTG GATAAATGAC 
GAGGTGAGGA AACTGAAAGC CACCGAGGAA 
AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 
AGGGGACCGC GGAGACCCTC AGACCCTGGA 
GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 
GAATGAAGGG CGCCGACACT CCTTCCTGCC 
ACCCCGATCA AGTTCCTTCC CATTTCTCCA 
CACATCCTCA GAGAAGCCCT CCTGGGGTCT 
CCTCCTATCG ATCCCCCAGC GCGCTCATCG 
TTTGAAGAGA GGATTCCCTG GATCGCGGAA 
CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 
CGGGCCTGCC TACCACTAGA TCCCCACCCA 
CTCTCCTACC ACACCCACCT TTCCCGGCCC 
CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 
TGGACAGACC CGGGGCAGCC TCTGGGTGAA 
GCGAACCGAG ACCAGCGAGC CGACCATGCG 
GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 
GGCCAGGACT CAGCCAGTCC CATCCGGACC 
GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 
TGGGAATTCC ATTTGCCAAG CCACCTCTAG 
CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 
GGCCATGTGT CTACAGGACC TCACCGCAGT 
AGTTCAACAT GACCTTCCCT TCCGACTCCA 
CTCAGCATCT ACACGCCGGC CCATAGCCAT 
GATGGTGTGG ATCCACGGTG GTGCGCTTGT 
ATGATGGTTC CATGCTGGCT GCCTTGGAGA 
CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 
AACCGGCAAC TGGGGCTACC TGGACCAAGT 
AGCAGAATAT CGCCCACTTT GGAGGCAACC 
GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 
ATCCCAAGGA CTCTTCCACG GAGCCATCAT 
TGCCCGGCCT CATTGCCAGC TCAGCTGATG 
AACCTGTCTG CCTGTGACCA AGTTGACTCT 
GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 
TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 
CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 
CAACAATGAA TTCGGCTGGC TCATCCCCAA 
CCCAGAAGGA AATGGACAGA GAGGCCTCCC 
TTAACGCTGC TGATGTTGCC TCCTACATTT 
GTACATTGGG GACAATGGGG ATCCCCAGAC 
AGATGATGGC GGACTCCATG TTTGTGATCC 
TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 
CAGCTGGCTC AAGAACATCA GGCCACCGCA 
AATTCACTGA GGAAGAGGAG CAGCTAAGCA 
GCCAACTTTG CGAGAAATGG GAACCCCAAT 
GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2651 GCAGCTGAAC CTACAGCCTG CGGTGGGCCG GGCTCTGAAG GCCCACAGGC 
2701 TCCAGTTCTG GAAGAAGGCG CTGCCCCAAA AGATCCAGGA GCTCGAGGAG 
2751 CCTGAAGAGA GACACACAGA GCTGTAGCTC CCTGTGCCGG GGAGGAGGGG 
2801 GTGGGTTCGC TGACAGGCGA GGGTCAGCCT GCTGTGCCCA CACACACCCA 
2851 CTAAGGAGAA AGAAGTTGAT TCCTTCATAA AAAAAAAA 



BLAST Results 



Entry D50579 from database EMBL: 

Homo sapiens mRNA for carboxylesterase, complete cds. 
Score = 7197, P = 0.0e+00, identities = 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 

Score = 2808, P « 1.2e-291, identities * 542/559, positives = 543/559, 
frame +3 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOXYLESTERASE_B_l (279-295) 
CARBOXYLESTERASE B_2 (185-196) 



1 MTAQSRSPTT PTFPGPSQRT PLTPCPVQTP RLGKALIHCW TDPGQPLGEQ 
51 QRVRRQRTET SEPTMRLHRL RARLSAVACG LLLLLVRGQG QDSASPIRTT 
101 HTGQVLGSLV HVKGANAGVQ TFLGIPFAKP PLGPLRFAPP EPPESWSGVR 
151 DGTTHPAMCL QDLTAVESEF LSQFNMTFPS DSMSEDCLYL SIYTPAHSHE 
201 GSNLPVMVWI HGGALVFGMA SLYDGSMLAA LENVVVVIIQ YRLGVLGFFS 
251 TGDKHATGNW GYLDQVAALR WVQQNIAHFG GNPDRVTIFG ESAGGTSVSS 
301 LVVSPISQGL FHGAIMESGV ALLPGLIASS ADVISTVVAN LSACDQVDSE 
351 ALVGCLRGKS KEEILATNKP FKMIPGVVDG VFLPRHPQEL LASADFQPVP 
401 SIVGVNNNEF GWLIPKVMRI YDTQKEMDRE ASQAALQKML TLLMLPPTFG 
451 DLLREEYIGD NGDPQTLQAQ FQEMMADSMF VIPALQVAHF QCSRAPVYFY 
501 EFQHQPSWLK NIRPPHMKAD HVKFTEEEEQ LSRKMMKYWA NFARNGNPNG 
551 EGLPHWPLFD QEEQYLQLNL QPAVGRALKA HRLQFWKKAL PQKIQELEEP 
601 EERHTEL 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n9, frame 3 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human, N - 1, Score » 2808, 
P - 1.9e-292 

TREMBL:HSU60553_1 gene: "hCE-2"; product: "carboxylesterase"; Human 
carboxylesterase (hCE-2) mRNA, complete cds., N = 1, Score = 2761, P « 
1.8e-287 

PIR:A34329 60K esterase (EC 3.1.1.-) isoform 2 - rabbit, N = 1, Score « 
1985, P = 3.1e-205 

TREMBL:D50580_1 product: "carboxylesterase precursor"; Rattus 
norvegicus mRNA for carboxylesterase, partial cds., N = 1, Score = 
1984, P = 4e-205 



>PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 
Length = 559 

HSPs: 

Score - 2808 (421.3 bits), Expect = 1.9e-292, P = 1.9e-292 
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Identities = 542/559 (96%) , Positives = 543/559 (97%) 



Query: 65 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 124 

MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
Sbjct: 1 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 60 

Query: 125 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 184 

IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 
Sbjct: 61 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 120 

Query: 185 EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVIIQYRLG 244 

EDCLYLSIYTPAHSHEGSNLPVWVWIHGGALVFGMASLYDGSMLAALENVVVVIIQYRLG 
Sbjct: 121 EDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVIIQYRLG 180 

Query: 24 5 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLWS 304 

VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 
Sbjct: 181 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 240 

Query: 305 PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 364 

PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 
Sbjct: 241 PISQGLFHGAIMESGVALLPGLIASSADVISTWANLSACDQVDSEALVGCLRGKSKEEI 300 

Query: 365 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 424 

LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 
Sbjct: 301 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 360 

Query: 425 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEKT4ADSMFVIPA 484 

KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 
Sbjct: 361 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 420 

Query: 485 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
Sbjct: 421 LQV AH FQC S RA PV Y FYE FQHQP S WLKN I RP PHMK ADHG D EI> P FV FRS F FGGN Y I K FT EEE 480 

Query: 529 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 588 

EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 
Sbjct: 481 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 540 

Query: 589 ALPQKIQELEEPEERHTEL 607 

ALPQKIQELEEPEERHTEL 
Sbjct: 541 ALPQKIQELEEPEERHTEL 559 

Pedant information for DKFZphtes3_35n9, frame 3 



Report for DKFZphtes3_35n9. 3 

[LENGTH] 607 

[MW] 67051.20 

;pu 6.11 

;homOL) PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 0.0 

[BLOCKS] BL01173A Lipolytic enzymes "G-D-X-G" family, histidine 

BLOCKS] BL00122G 

[BLOCKS} BL00122F 

BLOCKS) BL00122E 

BLOCKS] BL00122D Carboxylesterases type-B serine proteins 

BLOCKS] BL00122C Carboxylesterases type-B serine proteins 

BLOCKS] BL00122B Carboxylesterases type-B serine proteins 

BLOCKS] BL00122A Carboxylesterases type-B serine proteins 

SCOP] dlakn 3.56.1.1.4 Bile-salt activated lipase [Bovine (Bos taurus le-158 

SCOP] d2ack 3.56.1.1.1 Acetylcholinesterase [Electric ray (Torped le-170 

SCOP] dlthg 3.56.1.9.7 type-B carboxylesterase/lipase [fungu le-149 

EC] 3.1.1.13 Sterol esterase le-52 

EC] 3.1.1.7 Acetylcholinesterase 5e-74 

EC] 3.1.1.1 Carboxylesterase 0.0 

EC] 3.1.1.8 Cholinesterase 5e-68 

[EC] 3.1.1.59 Juvenile-hormone esterase le-34 

[EC] 3.1.1.3 Triacylglycerol lipase 3e-52 

PIRKW] duplication 2e-47 

PIRKW] homotetramer 3e-67 

PIRKW] transmembrane protein 9e-44 

PIRKW] microsome le-130 

PIRKW] pancreas 3e-52 

PIRKW] endoplasmic reticulum le-134 

PIRKW] homotrimer le-134 

PIRKW] phosphatidylinositol linkage 5e-74 

PIRKW] synapse 3e-73 

PIRKW] liver le-131 

PIRKW] heparin binding 3e-52 
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PIRKWJ 


phosphoprotein 7e-25 


PIRKW J 


glycoprotein le— 134 


PIRKWJ 


thyroid hormone biosynthesis 2e-47 


PIRKW] 


carboxylic ester hydrolase 0.0 


PIRKW] 


monomer 2e-42 


PIRKW] 


disulfide bond 2e-3l 


PIRKW] 


mammary gland 3e-52 


PIRKW] 


alternative SDlirinn Sp- 74 


PIRKW] 


iodine 2e—47 


PIRKW] 


pyroglutamic acid 6e — 39 


PIRKW) 


hydrolase le**135 


PIRKW] 


rou scle 3 e — 7 3 


PIRKW) 


thyroid gland 2e— 47 


PIRKW] 


iiiciiuj t a n c pLULciii jc r J 


PIRKW) 


npnrotran«imi t"tpf dpflcadation 3p— 7^ 


PIRKW] 




PIRKW] 


(lUlUUUimei 4C 1 r 


PIRKW] 


nerup 3p — 73 


SUPFAM1 




SUPFAM] 


t riacylglycerol lipase le — 32 


SUPFAM] 


cholinesterase homology 0.0 


SUPFAM] 


thyroglobulin 2e-47 


SUPFAM] 


thyroglobulin type I repeat homology 2e-47 


SUPFAM] 


juvenile-hormone esterase 2e-35 


SUPFAM] 


probable lipolytic protein ybaC le-07 


PROSITE) 


CARBOXY LEST ERASE B 2 1 


PROSITE] 


CARBOXY LEST ERA SE_B_1 1 


PFAM] 


Carboxylesterases 


KW] 


Alpha Beta 


KW] 


3D 


KW] 


LOW_COMPLEXITY 3 . 95 % 



SEQ MT AQS RS PT T PT F P G P S QRT P LT PC PVQT P RLGKAL I HCW T D PGQ P LGEQQRV RRQRT ET 

SEG xxxxxxxx. . . 

lacj- 

SEQ SEPTMRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQ 

SEG xxxxx 

lacj- ETTEEEECEEEEETTEE — EE 

SEQ TFLGIPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPS 

SEG 

lacj- EEEEEECEETTTGGGTTTCCEECCCCCCEEECCCCCCBCCCCCCTTTTTT-HHHHHCCCC 

SEQ DSMSEDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENWWIIQ 

SEG 

lacj- CCBTTTTCEEEEEET--TTTTTTEEEEEEECTTTTTTCTTTTGCHHHHHHHHCCEEEECC 

SEQ YRLGVLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTI FGESAGGTSVSS 

SEG 

lacj- CCCCGGGCCCTTTTTTTCCHHHHHHHHHHHHHHHCGGGGCEEEEEEEEEEECHHHHHHHH 

SEQ LVVSPISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKS 

SEG 

lacj- HHHCGGGTTTTCEEEEETTTTTTTTTTBCHHHHHHHHHHHHC-CCCCCHHHHHHHHHHCC 

SEQ KEEILAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRI 

SEG 

lacj- HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEEEEETBTHHHHHHTTTTT 

SEQ YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMF 

SEG 

lacj- TTTCCCCCHHHHHHHHHHHTTTTCHHHHHHHHHHCTTTTTTTHHHH-HHHHHHHHHHHHH 

SEQ VIPALQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHVKFTEEEEQLSRKMMKYWA 

SEG 

lacj- HHHHHHHHHHHHCCCCEEEEEECCCCGGGTTBTTTHHHCGGGCCCHHHHHHHHHHHHHHH 

SEQ NFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKKALPQKIQELEEP 

SEG xxxxx 

lacj - HHHHHCCCCCCC--CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH 

SEQ EERHTEL 

SEG xxxxxx . 

lacj- 



Prosite for DKFZphtes3_35n9.3 
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PS00122 
PS00941 



279->295 
185->196 



CARBOXYLESTERASE_B_l PDOC00112 
CARBOXYLESTERASE B 2 PDOC00112 



Pfara for DKFZphtes3_35n9 . 3 



HMM_NAME 
HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Carboxylest erases 

*MfMnwlimFLLwmItWli.WheqaprpPdPyiVdtnnCGkIRGmNedtD 
+ +L+++ ++++++++ ++Q++++P I T+ G + G ++ + 
69 RLRARLSAVACGLLLLLVRGQGQDSASP IRTTHT-GQVLGSLVHVK 113 

NG . . pYYvFlGI PYAEPPVGNLRFKe PQPYhe PWtNVWNATn Y P PMCMQW 
+ + +FLGI P+A+PP+G LRF +P+P +E W++V++ T+ P MC+Q+ 
114 GANAGVQTFLGIPFAKPPLGPLRFAPPEP-PESWSGVRDGTTHPAMCLQD 162 

ndFGFWlFdmieMWNeniP . . eMSEDCLYLNVWTPWnr kPNskLPVMVWI 
+++ +++N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
163 LTAV — ESEFLSQFNMTFPSDSMSEDCLYLSIYTPAHSHEGSNLPVMVWI 210 

HGGGFMFGSGhsYPliqYDgeylMMeeNVIVVtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+VV I+YRLG++GF+STGD + 

211 HGGALVFGMA SLYDGSMLAALENVWVIIQYRLGVLGFFSTGDKH 255 

1 P PHGNWGLW DQRMALQW VQDN I An FGGDPNN IT I FGESAGGMSVH1HML 
+ GNWG++DQ++AL+WVQ+NIA+FGG+P+++TIFGESAGG+SV+ + + 
256 AT — G N WG Y L DQV AA L RWVQQN I AH FGGN P DRVT I FGE S AGGT S V S S L VV 303 

SYGGDNPPmf KqLFHRAIMQSGsAmcPWvIQsnyNaRqRAfRFArimGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

304 S P I S QG L FHG A I ME S GV AL L PG L I AS S A — DV I S T VV ANL S AC D 34 5 

rmDssEMIqCLRsKPwEELWdAtWnFWmWfYf PFlPWFFgPVIDGDDaPE 
+ DS++++ CLR K+ EE++++++ +F + + +DG+ 
34 6 QVDS EALVGCLRGKSKEEI LAINK PFKMIPGV VDGV 381 

aFlPDHPeeMIkEGkFnDVPWllGYNnDEGiWFapMmMnf nWf dEDeWId 
F+P+HP+E++++ F VP I+G+NN E++W++P M + + +E++ 
382 -FLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDT-QKEMDR 429 

itNedWyeWMPYHFYrddmsNikDMDDYiDkvyEeYPgWWDrFPqESYW 
++ + ++ M +L + + + D ++EEY+G+ + PQ 

430 EASQAALQKMLTLLMLPPT-F GDLLREEYIGDNGD-PQTLQA 4 69 

nLqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFmWR 
++Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
470 QFQEMMADSMFVI P- -ALQVAHFQCSRAPVYFYEFQHQPSW LKN 511 

WWPpWMgvdH* 
+PP+M++DH 
512 IRPPHMKADH 521 

*tEEEiissMRmMMNYWINFAKhGNPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+NFA++GNPN++ GL++WP ++++EQY++ + 
525 TEEEEQLS-RKMMKYWANFARNGNPNGE — GLPHWPLFDQEEQYLQLNL 570 



tllmiQmCrrarDPYCNFW* 
+ +++++ + FW 

571 QPAVGRALKAHR — LQFW 



586 
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DKFZphtes3_35pl7 



group: testes derived 

DKFZphtes3_35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
eukaryotes. Some of them, like armadillo, beta-catenin and plakoglobins have dual functions 
intercellular junctions and signalling cascades. Others, belonging to the impor tin-alpha - 
subfamily are involved in NLS recognition and nuclear transport, while some members of the 
armadillo family have as yet unknown functions. The novel protein shows similarity to S. 
cerevisiae protein Yel013p (VAC8) and Danio rerio b-catenin, but contains no armadillo (arm) 
repeats . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . ■ 



similarity to S. cerevisiae VAC8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos . 1956, polyadenylation signal at pos. 1935 



1 AAGTCAAATG TAAGATTGGT TCATTAAAAA TACTGAAGGA AATCAGTCAT 
51 AATCCTCAAA TCAGACAGAA TATTGTTGAC CTTGGGGGCT TACCAATTAT 
101 GGTGAATATA CTTGATTCTC CACACAAGAG TCTAAAATGT TTGGCAGCCG 
151 AGACTATCGC GAATGTTGCC AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 
201 CAGCACGGGG GTATCACCAA ACTGGTTGCT C TACT AG ACT GTGCACATGA 
251 TTCCACAAAA CCTGCCCAAT CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 
301 TGGCTCGCTG TGGGGCACTG GCCCTGTGGA GCTGCAGTAA GAGTCATACG 
351 AATAAAGAAG CCATCCGCAA AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 
401 GCTGAAGACT TCTCATGAAA ACATGCTAAT TCCAGTGGTG GGGACATTGC 
451 AAGAGTGTGC ATCAGAGGAA AACTACCGGG CTGCAATCAA AGCAGAAAGG 
501 ATCATTGAAA ACCTTGTCAA GAACCTAAAT AGTGAGAATG AGCAGCTGCA 
551 GGAGCACTGC GCCATGGCCA TTTACCAGTG TGCTGAAGAT AAGGAAACCC 
601 GGGACCTCGT TAGGCTGCAC GGAGGACTTA AGCCCTTGGC CAGTCTACTC 
651 AATAACACTG ACAATAAAGA GCGGTTAGCT GCTGTCACAG GGGCTATATG 
701 GAAATGTTCC ATCAGCAAAG AGAATGTTAC CAAGTTTCGG GAATACAAAG 
751 CCATTGAAAC CTTGGTGGGA CTTCTAACAG ATCAGCCTGA AGAAGTACTT 
8 01 GTGAATGTGG TTGGGGCCTT GGGAGAATGC TGCCAAGAAC GTGAAAACCG 
851 AGTCATTGTC CGGAAATGTG GTGGCATTCA ACCACTTGTG AACCTCCTTG 
901 TTGGAATAAA CCAAGCTCTT CTTGTGAATG TTACAAAAGC AGTTGGTGCT 
951 TGTGCAGTAG AACCTGAAAG TATGATGATA ATTGATCGCT TAGATGGAGT 
1001 TCGTTTGTTG TGGTCCCTGC TGAAAAATCC TCACCCAGAC GTGAAGGCCA 
1051 GCGCAGCATG GGCACTCTGT CCATGCATCA AAAATGCAAA GGATGCTGGG 
1101 GAAATGGTTC GTTCCTTTGT TGGTGGTTTG GAACTTATTG TCAATTTACT 
1151 GAAATCAGAT AACAAAGAAG TTCTGGCAAG TGTATGTGCT GCCATTACCA 
1201 ACATAGCAAA AGATCAAGAA AATTTAGCTG TTATCACAGA TCATGGAGTT 
1251 GTTCCTTTAT TGTCCAAACT GGCAAATACA AATAACAATA AATTGAGACA 
1301 TCATCTAGCA GAAGCTATTT CACGTTGCTG TATGTGGGGC AGGAATAGAG 
1351 TGGCCTTCGG TGAGCACAAA GCAGTGGCTC CACTAGTGCG TTATCTGAAA 
1401 TCAAATGACA CCAACGTGCA TCGGGCGACA GCTCAGGCCT TGTACCAACT 
1451 CTCAGAAGAC GCCGATAACT GCATCACCAT GCATGAGAAT GGTGCAGTAA 
1501 AGCTTCTACT GGATATGGTT GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 
1551 GCAGCTGGTT GTATATCCAA TATCCGCAGG CTGGCTCTTG CTACAGAGAA 
1601 GGCAAGATAC ACTTGAAATT TAAATGGACA TTACAAGCTA TCAAATTCTA 
1651 CATGACACAG GACATGTCAC TCCCATGGCC AGAAAGCCTA AATTGGGAAA 
1701 CAGTTGTTAG CAAACCCTTT CAACCATCTA AATGAAAACA CACAAATTGA 
1751 AAATGCACAG AATGTTTTTC ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 
1801 TATTTAATGT TTTCGAGATA TGACATGTGA TAAGATGGAA AGCCAATAAA 
1851 CCTGTGATAA GTTTCTAAGA ATATGAGAAT ATACGTATAT GATGTATTTT 
1901 TAGTTCAGTG ATGCTTTTGT ATTTGTGGCG ATTTTAATAA AGGATATGGC 
1951 CTTCCCAAAA AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98413148: 

Yel013p (Vac8p) , an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane . 

98330438: 

YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharomyces 

cerevisiae vacuolar membrane that 

functions in vacuole fusion and inheritance. 

98158703: 

Vac8p, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole . 



Peptide information for frame 3 



ORF from 99 bp to 1613 bp; peptide length: 505 
Category: similarity to known protein 
Classification: unset 



1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RWRQHGGIT KLVALLDCAH 

51 DSTKPAQSSL YEARDVEVAR CGALALWSCS KSHTNKEAIR KAGGIPLLAR 

101 LLKTSHENML I P WGTLQEC ASEENYRAAI KAERIIENLV KNLNSENEQL 

151 QEHCAMAIYQ CAEDKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 

201 WKCSISKENV TKFREYKAIE TLVGLLTDQP EEVLVNVVGA LGECCQEREN 

251 RVIVRKCGGI QPLVNLLVGI NQALLVNVTK AVGACAVEPE SMMIIDRLDG 

301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELIVNL 

351 LKSDNKEVLA SVCAAITNIA KDQENLAVIT DHGVVPLLSK LANTNNNKLR 

401 HHLAEAISRC CMWGRNRVAF GEHKAVAPLV RYLKSNDTNV HRATAQALYQ 

4 51 LSEDADNCIT MHENGAVKLL LDMVGSPDQD LQEAAAGCIS NIRRLALATE 

501 KARYT 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35pl7, frame 3 

PIR;S50446 VAC8 protein - yeast (Saccharomyces cerevisiae), N =» 1, 
Score = 237, p = 7.8e-17 

PIR:T00403 T13E15.9 protein - Arabidopsis thaliana, N « 1, Score = 215, 
P = 4.9e-14 

TREMBL:DR41081_1 product: "b-catenin" ; Danio rerio b-catenin mRNA, 

complete cds., N = 1, Score = 195, P = 5.8e-12 



>PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae) 
Length =57 8 

HSPs: 



Score - 237 (35.6 bits), Expect - 7.8e-17, P - 7.8e-17 
Identities = 106/401 (26%), Positives - 177/401 (44%) 



Query: 


92 


AGGIPLLARLLKTSHENMLIPVVGTLQECASEENYRAAIKAERIIENLVKNLNSENEQLQ 


151 






+GG PL A +N+ + L E Y + E ++E ++ L S++ Q+Q 




Sbjct: 


45 


SGG-PLKALTTLVYSDNLNLQRSAALAFAEITEKYVRQVSRE-VLEPILILLQSQDPQIQ 


102 


Query: 


152 


EHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVT 


211 






A+ A + E + L+ GGL+PL + + DN E G I + +N 




Sbjct: 


103 


VAACAALGNLAVNNENKLLIVEMGGLEPLINQMMG-DNVEVQCNAVGCITNLATRDDNKH 


161 


Query: 


212 


KFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRV I VRKCGG I QPLVNLLVGI N 


271 






K A+ L L + V N GAL ENR + G + LV+LL + 




Sbjct: 


162 


KI ATSGALI PLTKLAKSKHI RVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTD 


221 


Query: 


272 


QALLVNVTKAVGACAVEPESMMIIDRLDG— VRLLWSLLKNPHPDVKASAAWALCPCIKN 


329 



+ T A+ AV+ + + + + V L SL+ +P VK A AL + 
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Sbjct : 


222 


PDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASD 


281 


Query: 


330 


AKDAGEMVRSFVGGLELIVNLLKSDNKE-VLASVCAAITNIAKDQENLAVITDHGVV-PL 


387 






E+VR+ GGL +V L++SD+ VLASV A I NI+ N +1 D G + PL 




Sbjct: 


282 


TSYQLEIVRA — GGLPHLVKLIQSDSIPLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 


338 


Query: 


388 


LSKLANTNNNKLRHHLAEAISRCCMWG-RNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQ 


446 






+ L ++ +++ H + +NR F E AV + +V ++ 




Sbjct: 


339 


VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 


397 


Query : 


447 


ALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCISNI 4 92 








A++ AD+++E +L+MS +Q++ AA ++N+ 




Sbjct: 


398 


ACFAILALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANL 444 




Score 


= 213 


(32.0 bits), Expect » 3.6e-14, P = 3.6e-14 




Identities = 81/341 (23%), Positives « 163/341 (47%) 




Query: 


163 


EDKETRDLVRLHGGLKPIASLLNNTD-NKERLAAVTGAIWKCSISKENVTKFREYKAIET 


221 






EDK+ D G LK L +L+ + + N +R AA+ A I+++ V + + +E 




Sbjct: 


36 


EDKDQLDFYS-GGPLKALTTLVYSDNLNLQRSAALAFA EITEKYVRQVSR-EVLEP 


89 


Query: 


222 


LVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKA 


281 






++ LL Q ++ V ALG EN++++ + GG++PL+N ++G N + N 




Sbjct: 


90 


ILILLQSQDPQIQVAACAALGNLAVNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGC 


149 


Query: 


282 


VGACAVEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFV 


341 






+ A ++ I + L L K+ H V+ +A AL + ++ E+V + 




Sbjct: 


150 


ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA-- 


207 


Query: 


342 


GGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVI— TDHGVVPLLSKLANTNNNKL 


399 






G + ++V+LL S + +V A++NIA D+ N + T+ +V L L ++ ++++ 




Sbjct: 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query: 


400 


RHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCI 


459 






+ A+ ++ + LV+ ++S+ + A+ + +S N 




Sbjct : 


268 


KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query: 


460 


TM H E NG A VKLLL DMVG S P DQ DL QE AAAGC I S N I RRL AL AT E KAR 503 








+ + G +K L+ ++ D + E +S +R LA ++EK R 




Sbjct: 


328 


LIVDAGFLKPLVRLLDYKDSE--EIQCHAVSTLRNLAASSEKNR 369 




Score 


= 180 


(27.0 bits), Expect = 1.6e-10, P - 1.6e-10 




Identities = 


= 80/346 (23%), Positives = 142/346 (41%) 




Query: 


145 


SENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCS 


204 






S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 




Sbjct: 


58 


SDNLNLQRSAALAFAEITE-KYVRQVSR— EVLEPILILLQSQDPQIQVAACA-ALGNLA 


113 


Query: 


205 


ISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLV 


264 






++ EN E +E L+ + EV N VG + +N+ + G + PL 




Sbjct: 


114 


VNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGCITNLATRDDNKHKIATSGALIPLT 


173 


Query: 


265 


NLLVGINQALLVNVTKAVGACAVEPESMMI I DRLDGVRLLWSLLKNPHPDVKASAAWALC 


324 






L + + N T A+ E+ + V +L SLL + PDV+ AL 




Sbjct: 


174 


KLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTALS 


233 


Query: 


325 


PCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVITDHGV 


384 






++++++ + +V+L+ S + V A+ N+A D I G 




Sbjct: 


234 


NIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 


293 


Query: 


385 


VPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 


444 






+ P L KL +++ L I + N +■ + PLVR L D+ + 




Sbjct: 


294 


LPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEIQCH 


353 


Query: 


445 


A-QALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCIS 490 








A L L+ ++ N E+GAV+ ++ +Q + C + 




Sbjct: 


354 


AVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFA 401 




Score 


- 155 


(23.3 bits), Expect = 8.8e-08, P = 8.8e-08 




Identities = 88/401 (21%), Positives = 175/401 (43%) 




Query: 


60 


L YE ARO — VE V ARCG AL ALW S C S KS HTN KEAI RK AGG I-PLLARLLKTSHENMLI P VVGT 


116 






L +++D ++VA C AL + + ++ NK I + GG+ PL+ +++ + E + VG 




Sbjct: 


93 


LLQSQDPQIQVAACAALG — NLAVNNENKLLIVEMGGLEPLINQMMGDNVE-VQCNAVGC 


149 


Query: 


117 


LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLHG 


175 






+ A+ ++ + I + L K S++ ++Q + A+ +E R +LV G 




Sbjct: 


150 


ITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA-G 


208 


Query: 


176 


GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR — EYKAIETLVGLLTDQPEEV 


233 






+ L SLL++TD + T A+ ++ + N K E + + LV L+ V 
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Sbjct: 209 AVPVLVSLLSSTDPDVQYYCTT-ALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 

Query: 234 LVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMM 293 

AL + ++ + + GG+ LV L+ + L++ + ++ P + 

Sbjct: 268 KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 327 

Query: 294 1 1 DRL DG VRL LWS L L K - N PH P DVKAS AAW A LC P C I KN A - K DAGEMV R S FVGGL EL I VN LL 351 

+1 ++ L LL +++ A L ++ K+ E S G +E L 

Sbjct: 328 LIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFES--GAVEKCKELA 385 

Query: 352 KSDNKEVLA— SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAISR 409 

V+ SCAI +AD L ++ + ++ L + +N++ + A A++ 
Sbjct: 386 LDSPVSVQSEISACFAIIALA-DVSKLDLL-EANILDALIPMTFSQNQEVSGNAAAALAN 443 

Query: 410 CCMWGRNRVAFGE HKAVAP-LVRYLKSNDTNVHRATAQALYQLSE 453 

C N E ++ + L+R+LKS+ + QL E 

Sbjct: 444 LCSRVNNYTKIIEAWDRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLE 4 93 

Score = 139 (20.9 bits), Expect - 5.0e-06, P » 5.0e-06 
Identities - 80/329 (24%), Positives « 142/329 (43%) 

Query: 37 GGITKLVALLDCAHD-STKPAQ SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKA 92 

G IT L DH+TA +L +++ + V R AL + + S N++ + A 
Sbjct: 148 GCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA 207 

Query: 93 GGIPLLARLLKTSHENMLIPVVGTLQECASEE-NYRAAIKAE-RIIENLVKNLNSENEQL 150 

G +P+L LL ++ ++ L A +E N + + E R++ LV ++S + ++ 

Sbjct: 208 GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 267 

Query: 151 QEHCAMAIYQCAEDKETR-DLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKEN 209 

+ +A+ AD + ++VR GGL L L+ + D+ + A I SI N 
Sbjct: 268 KCQATLALRNLASDTSYQLEIVRA-GGLPHLVKLIQS-DSIPLVIASVACIRNISIHPLN 325 

Query: 210 VTKFREYKAIETLVGLLT-DQPEEVLVNVVGALGECCQERE-NRVIVRKCGGIQPLVNLL 267 

+ ++ LV LL EE+ +VL E NR +G++ L 

Sbjct: 326 EGLIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELA 385 

Query: 268 VG—INQALLVNVTKAVGACA-VEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-L 323 

+ ++ ++ A+ A A V ++ + LD + + + +N A+AA A L 

Sbjct: 386 LDSPVSVQSEISACFAILALADVSKLDLLEANILDAL-IPMTFSQNQEVSGNAAAALANL 444 

Query: 324 CPCIKN-AKDAGEMVRSFVGGLELIVNLLKSD 354 

C + N K R G ++ LKSD 

Sbjct: 44 5 CSRVNNYTKIIEAWDRPNEGIRGFLIRFLKSD 47 6 

Score = 136 (20.4 bits), Expect « l.le-05, P = l.le-05 
Identities - 72/304 (23%), Positives « 133/304 (43%) 

Query: 58 SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPWGTL 117 

+ L +++ + V R AL + + S N++ + AG +P+L LL ++ ++ L 
Sbjct: 173 TKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTAL 232 

Query: 118 QECASEE-NYRAAIKAE-RIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLH 174 

A +E N + + E R++ LV ++S + +++ +A+ AD + ++VR 
Sbjct: 233 SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 291 

Query: 175 GGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLT-DQPEEV 233 

GGL L L+ + D+ + A I SI N + ++ LV LL EE+ 

Sbjct: 292 GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEI 350 

Query: 234 LVN WGA LGEC CQERE-NRVIV RKCGG I QPL VN LLVG - - 1 NQALLVN VT KA VG AC A- V E P 289 

+ V L E NR + G ++ L + ++ ++ A+ A A V 

Sbjct: 351 QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFAILALADVSK 410 

Query: 290 ESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-LCPCIKN-AKDAGEMVRSFVGGLELI 347 

++ + LD + + + +N A+AA A LC + N K RG + 

Sbjct: 411 LDLLEANIL DAL - 1 PMT FS QNQE V S GN AAAAL AN LC S RVN N YTK 1 1 E AW DRPNEGIRGFL 469 

Query: 348 VNLLKSD 354 

+ LKSD 
Sbjct: 470 IRFLKSD 47 6 

Score = 114 (17.1 bits), Expect = 2.7e-03, P = 2.7e-03 
Identities - 71/335 (21%), Positives - 132/335 (39%) 

Query: 1 MVNILDSPHKSLKCLAAETIANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 60 

+ + SH++A +N+ +R++ G + LV+LL ST P 
Sbjct: 172 LTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLS STDP 222 

Query: 61 YE ARDVE V ARCGA LAL WS C SKSHTNKEAI RKAGG IPLLARLLKTSH ENM L I P V VGT L QEC 120 
DV+ AL+ + +++ KA++LL++ + L+ 
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Sbjct: 223 DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 278 

Query: 121 ASEENYRAAIKAERirENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 180 

AS+ +Y+ I + +LVK + S++ L I + L+ G LKPL 

Sbjct: 279 ASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPL 338 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKE-NVTKFREYKAIETLVGLLTDQPEEVLVNVVG 239 

LL+ D++E + + S E N +F E A+E L D P V + 

Sbjct: 339 VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISA 398 

Query: 240 ALG ECCQEREN RV I V R KCGG I QPL VNL L VG I NQALL VNVT KAVG - AC A V EPESMMIIDRL 298 

+++ + + + L+ + NQ + N A+ C+ 11+ 
Sbjct: 399 CFAIIALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANLCSRVNNYTKIIEAW 458 

Query: 299 D GVR-LLWSLLKNPHPDVKASAAWALCPCIKNAKDAGE 335 

D G+R L LK+ + + A W + +++ D E 
Sbjct: 4 59 DRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLESHNDKVE 500 

Score = 106 {15.9 bits), Expect - 2.0e-02, P = 2.0e-02 
Identities - 49/204 (24%), Positives » 89/204 (43%) 

Query: 65 DVEVARCGALA-LWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLI PVVGTLQECA-S 122 

+VEV +C A+ + + + NK I +G + L +L K+ H + G L S 

Sbjct: 139 NVEV-QCNAVGCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHS 197 

Query: 123 EENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRD-LVRLHGGL-KPL 180 

EEN + + A + LV L+S + +Q +C A+ A D+ R L + L L 
Sbjct: 198 EENRKELVNAGAV-PVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKL 256 

Query: 181 ASLLNKTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 240 

SL+++ ++ + A T A+ + + + LV L+ +++ V 

Sbjct: 257 VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

Query: 241 LGECCQERENRVIVRKCGGIQPLVNLL 267 

+ N ++ G ++PLV LL 

Sbjct: 316 IRNISIHPLNEGLIVDAGFLKPLVRLL 342 

Pedant information for DKFZphtes3_35pl7, frame 3 



Report for DKFZphtes3_35pl7 . 3 

[LENGTH J 505 

[MW] 55224.34 

[pi] 8.43 

[HOMOL] PIR:S50446 VAC8 protein - yeast (Saccharomyces cerevisiae) 2e-16 

[FUNCAT] 30.25 vacuolar and lysosomal organization [S. cerevisiae, YEL013w] 8e-18 

[FUN CAT) 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013wJ 

8e-18 

[ FUNCAT } 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YEL013w] 8e-18 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YNLl89w) 3e-06 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YNLl89w] 3e-06 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YNL189w] 3e-06 

[BLOCKS] BL01265C 

[BLOCKS] BL00242A Integrins alpha chain proteins 

[SCOP] d3bct 1.91.1.1.1 beta-Catenin [Mouse (Mus musculus) 7e-18 

[PIRKW] cytosol 3e-ll 

[PIRKW] apoptosis 3e-ll 

[PIRKW] carcinogenesis 3e-ll 

[PIRKW] cell adhesion 3e-ll 

[PIRKW] cytoskeleton 3e-12 

[SUPFAM] pendulin le-07 

[KW] All_Alpha 

[KW] 3D 

[KW] LOW_COMPLEXITY 2.38 % 

SEQ MVNILDSPHKSLKCLAAETIANVAKFKRARRWRQHGGITKLVALLDCAHDSTKPAQSSL 

SEG xxxxxxxxxxxx 

2bct- HH 



SEQ Y E ARDVEVARC GALALWSCSKSHTNKEAI RKAGG I PLL ARL LKT S H ENML I P VVGT LQEC 

SEG 

2bct- HHCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCHHHHHHHGGGCCCHHHHHHHHHHHHHH 

SEQ ASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 

SEG 

2bct- HHTTTHHHHHHHHCHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHHCHHHHH 
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SEQ ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 

SEG 

2bct- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMMIIDRLDG 

SEG 

2bct- H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLA 

SEG 

2bCt- HHHHHHHHHTTTHHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAF 

SEG 

2bCt- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHHCHHHHH 

SEQ GEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCITMHENGAVKLLLDMVGSPDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCISNIRRLALATEKARYT 

SEG 

2bct- HHHHHHHHH 

(No Prosite data available for DKFZphtes3_35pl7 . 3) 
(No Pfam data available for DKFZphtes3_35pl7 . 3) 
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DKFZphtes3_35p22 



group: cell cycle 

DKFZphtes3_35p22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 (tre-2 
locus) . 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiquitination. 



strong similarity to oncogene 1 (tre-2 locus) 
membrane regions: 1 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: map='*17" 
Insert length: 2072 bp 

Poly A stretch at pos. 2062, polyadenylation signal at pos. 2039 



1 GTTACACACA GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 
51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 
101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 
151 TCATTATGAA ATACGAAAAG GGACACCGAG CTGGGCTGCC AGAGGACAAG 
201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 
251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 
301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 
351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 
401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 
451 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT " ACCAGATCAT GAAGGAGAAG 
501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 
551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 
601 GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 
651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 
701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 
751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 
801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 
851 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 
901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 
951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 
1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 
1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 
1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 
1151 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 
1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 
1251 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 
1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 
1351 GTGGGGCTGT CCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 
1401 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 
1451 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 
1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 
1551 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 
1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 
1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 
1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 
1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 
1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 
1851 TATTAAGAAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 
1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 
1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 
2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 
2051 CCTGTTGAAA TGAAAAAAAA AA 



BLAST Results 



Entry AC00397 6 from database EMBL: 

Homo sapiens chromosome 17, clone hCIT.91_J_4, complete sequence. 
Score « 4385, P = 0.0e+00, identities - 881/886 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS A001W35. 
Score = 850, P « 1.9e-32, identities = 170/170 



Medline entries 



92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast DOA4 gene encodes a deubiquitinating enzyme 
related to a product of the human tre-2 oncogene. 

95176708: 

UBP5 encodes a putative yeast ubiquitin-specif ic protease 
that is related to the human Tre-2 oncogene product. 



Peptide information for frame 3 



ORF from 99 bp to 174 5 bp; peptide length: 549 
Category: strong similarity to known protein 



1 MDVVEVAGSW WAQEREDIIM KYEKGHRAGL PEDKGPKPFR SYNNNVDHLG 
51 IVHETELPPL TAREAKQIRR EISRKSKWVD MLGDWEKYKS SRKLIDRAYK 
101 GMPMNIRGPM WSVLLNTEEM KLKNPGRYQI MKEKGKKSSE HIQRIDRDVS 
151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHIAALFLLY 
201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 
251 MGHQDKKDLC GQCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 
301 TRIAFKVQQK RLTKTSRCGP WARFCNRFVD TWARDEDTVL KHLRASMKKL 
351 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 
401 IWSASPPRAP RSSTPCPGGA VREDTYPVGT QGVPSPALAQ GGPQGSWRFL 
451 QWNSMPRLPT DLDVEGPWFR HYDFRQSCWV RAISQEDQLA PCWQAEHPAE 
501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35p22, frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N = 1, Score «« 
2181, P « 5.5e-226 

PIR:S57867 oncogene 1 - human, N - 1, Score = 1536, P » 1.2e-157 



>PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 
Length = 786 

HSPs: 

Score » 2181 (327.2 bits), Expect = 5.5e-226, P = 5.5e-226 
Identities = 405/500 (81%), Positives - 440/500 (88%) 

Query: 1 MDVVEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 60 

MD+VE A S AQER+DI+MKY+KGHRAGLPEDKGP+P N+++D GI+HETELPP+ 
Sbjct: 1 MDMVENADSLQAQERKDILMKYDKGHRAGLPEDKGPEPV-GINSSIDRFGILHETELPPV 59 

Query: 61 TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 120 

TAREAK+IRRE++R SKW++MLG+WE YK S KLIDR YKG+PMNIRGP+WSVLLN +E+ 
Sbjct: 60 TAREAKKIRREMTRTSKWMEMLGEWETYKHSSKLIDRVYKGIPMNIRGPVWSVLLNIQEI 119 

Query: 121 KLKNPGRYQIMKEKGKKSSEHIQRI DRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 180 

KLKNPGRYQIMKE+GK+SSEHI ID DV TLR H+FFRDRYG KQREL +ILLAY EY 
Sbjct: 120 KLKNPGRYQIMKERGKRSSEHIHHIDLDVRTTLRNHVFFRDRYGAKQRELFYILLAYSEY 179 

Query: 181 NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 240 

NPEVGYCRDLSHI ALFLLYLPEEOA FWAL VQL LAS E RH S L GFHSPNGG TVQGLQDQQE 
Sbjct: 180 NPEVGYCRDLSHITALFLLYLPEEDAFWALVQLLASERHSLPGFHSPNGGTVQGLQDQQE 239 
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Query: 241 HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 300 

HVV SQPKTM HQDK+ LCGQC+ LGCL+R LIDGISLGLTLRLWDVYLVEGEQ LMPI 
Sbjct: 240 HVVPKSQPKTMWHQDKEGLCGQCASLGCLLRNLIDGISLGLTLRLWDVYLVEGEQVLMPI 299 

Query: 301 TRIAFKVC£KRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T I A KVQQKRL KTSRCG WAR N+F DTWA ++DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct: 300 TSIALKVQQKRLMKTSRCGLWARLRNQFFDTWAMNDDTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 480 

VREDTYPVGTQGVPS ALAQGGPQGSWRFL+W SMPRL PTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSLALAQGGPQGSWRFLEWKSMPRLPTDLDIGGPWFPHYDFERSCWV 479 

Query: 481 RAISQEDQLAPCWQAEHPAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 480 RAISQEDQLATCWQAEHCGE 499 



Pedant information for DKFZphtes3_35p22, frame 3 



Report for DKFZphtes3_35p22 . 3 



[LENGTH] 


549 




[MW] 


62159.16 




[pij 


9.23 




[ HOMOL ] 


PIR:S22155 oncogene 1 


(tre-2 locus) (clone 210) - human 0.0 


[FUNCAT] 


11.01 stress response 


[S. cerevisiae, YGRlOOw] 2e-16 


[FUNCAT] 


04.05.01.04 transcriptional control [S. cerevisiae, YGRlOOw] 2e-16 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YNL293w] 3e-15 


[PIRKW] 


transmembrane protein 


6e-14 


[PROSITE] 


MYRISTYL 6 




[PROSITE1 


AMI DAT I ON 1 




[PROSITE] 


CAMP PHOSPHO SITE 


3 


[PROSITE] 


CK2 PHOSPHO SITE 


4 


[PROSITE] 


TYR PHOSPHO SITE 


2 


[PROSITE] 


PKC~PHOSPHO SITE 


10 


[KW] 


TRANSMEMBRANE 1 




[KW] 


LOW COMPLEXITY 5.; 


28 % 



SEQ MDWEVAGSWWAQEREDI IMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

SEG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 

SEG 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM MMMMMMMMMMMMMMMMMM 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ AK P EQGS S AS RP V PA S RGG KT LC KG DRQ A PPGPPARFPRPIWSASPPRAPRSSTPC PGG A 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 
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SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



V RE DT Y P VGT QGV P S P AL AQGG PQG S W R FLQWNS M P RL PT DL DV EG P W F RH Y DFRQS CW V 
cccccccccccccccccccccccccceeeeeccccccccccccccccccccccccccccc 

RAISQEDQLAPCWQAEHPAERVRSAFAAPSTDSDQGTPFRARDEQQCAPTSGPCLCGLHL 
cchhhhhhhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccceeee 

ESSQFPPGF 
ccccccccc 



Prosite for DKFZphtes3_35p22 .3 



PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 

psooooe 

PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 



136->140 
310->314 
348->352 
61->64 
73->76 
90->93 
152->155 
216->219 
282->285 
315->318 
346->349 
351->354 
446->449 
61->65 
460->464 
484->488 
511->515 
93->100 
92->100 
8->14 
101->107 
230->236 
276->282 
366->372 
441->447 
134->138 



CAMP_PHOSPHO SITE 

C AMP_PHOS PHCTS I TE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOS PHO_SI TE 

CK2_PHOSPHO_SITE 

T YR_PHOS PHO_S I TE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 



PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphtes3_35p22 . 3) 
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DKF2phtes3_4b4 



group: testes derived 

DKFZphtes3_4b4 encodes a novel 497 amino acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins S ( CP/Tpx-l/Ag5/PR-l/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes or as a new protease inhibitor. 



strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 
Sequenced by AGOWA 

Locus: /map-"333.4 cR from top of Chrl6 linkage group" 
Insert length: 4574 bp 

Poly A stretch at pos. 4551, polyadenylation signal at pos. 4539 



1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 
51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 
101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 
151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 
201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 
251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 
301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 
351 AGCCATCCCC AGGGAGGACA AGGAGGAGAT CCTCATGCTG CACAACAAGC 
401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 
451 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 
551 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA AGGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC ACACAGATAG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
751 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 
1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AGAAAACCTC TGCGGTCAAC 
1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 
1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 
1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 
1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 
1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAG AG AC AC G 
1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 
1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 
1401 CCCGTTTGAA AAGCCAGCAA CTCACTGCCC AAGAATCCAT TGTCCGGCAC 
1451 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 
1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 
1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAAAAGA 
1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 
1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 
1701 TTCCAGCACC AGGGGAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 
1751 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 
1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 
1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 
1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGGGTCTCCA TCTGGACGTC 
1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 
2001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 
2051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 
2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 
2151 AAGTAGAAGG TAGTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 
2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 
2251 AGTAAGAGGG CTGCGGGTAT GAGAGACCCC GGCTCCGCCC TGGCACGTGT 
2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 
2351 GCTCTATACA CAGCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 
2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATGGC CCACCTGTTT 
2451 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 TTACCCCCTA CCCATTGTGG CTCCCACCCT GCCTCGGACT GGTTTACGTG 
2551 TCCTGGTTCA CACCCAGGAC TTTTCTTTGC AAGCGAACCT GTTTGAAGCC 
2601 CAAGTCTTAA CTCCTGGTCT CGTAAGGTTC CACTGAGACG AGATGTCTGA 
2651 GAACAACCAA AGAAGGCCTG CTCTTTGCTG CTTTTAAAAA ATGACAATTA 
2701 AATGTGCAGA TTCCCCACGC ACCCGATGAC CTATTTTTTC AGCCGTGGGA 
2751 GGAATGGAGT CTTTGGTACA TTCCTCACCG AGGTTAGCAG CTCAGTTTGT 
2801 GGTTATGAAA CCGTCTGTGG CCTCATGACA GCGAGAGATG GGAATACACT 
2851 AGAAGGATCT CTTTTCCTGT TTTCGTGAAA CGACTCTTGC CAAACGTTCC 
2901 CGAGGCGCCA AGGAGTGTAG TACACCCTGG CTGCCATCAC TCTATAAAAG 
2951 TGCTTCATGA GCCCAGACCA AAAGCCCACA GTGAAATGAA GTACCCTTTT 
3001 GTAAATAGCA TTTTTTTGCA GAAGGTGAAA ATTCCACTCT CTACCACCGG 
3051 GCCAGCCAAT AGATCACTTT GGTGAATGCT AGTTTCAAAT TTGATTCAAA 
3101 ATATTTCTTA GGTGAAAGAA CTAGCAGAAA GTCAAAAACT AAGATACTGT 
3151 AGACTGGACA AGAAATTCTA CCTGGGCACC TAGGTGATGC CTTCTTTCTT 
3201 TGATTGCCTT TCTAATAAAT GCAGAATCTG AAGGTAAATA GGTTTAAAAC 
3251 AAAACAAAAA CCCACCCCTT TAAGGAGTTG GTAAAAAGCA GTTCAACTCT 
3301 TAGCTTGACT GAGCTAAAAT TCACAGGACT ACGTGCTTTG TGCATTGTAG 
3351 TCTAGTCGTA ATTCATAGGT ACTGACTCCT CAGCCCCAAA TGTCGGAGAG 
3401 GAAGAATTCG GTCAGCCTGT CAGGTCGTGA GTCCAGTTAC CACCAAACAT 
3451 CTGGGAAACT TCTGGGTGCT GGGTGCTCTG CTGCTGGACT TTTGTGGCTG 
3501 TGTCTGTGTC TGCAAGATAA ATTAGATCGC CCTGTGGGGT TTGCAGAATT 
3551 AGTGAAGGGT CCAGGACGAT CCCAGTGGGC TCGCTTCCAA AGCATCCCAC 
3601 TCAAGGGAGA CTTGAAACTT CCAGTGTGAG TTGACCCCAT CATTTAAAAA 
3651 TAAAGTCCCC GGGTTCCTTA ATGCCTCCTT CACTGGGCCT TCCTAGCAGG 
3701 ATAGAAAGTC CTTGCCCAGA GCAGGACCTG GCTGTCTTTT TTTTTTTTTT 
3751 TTTCCCGAGA CCAAGTTTCA CTCTGTTGCC CAAGGTAGAG TGCAGTGGCG 
3801 TGATCTCTGC TCATTGCAAC TGCCGCCTCC CGGGTTCAAG CAATTCTCAT 
3851 GCATCAGCCT CCCAAGTACC TGGGACTACA GGCGTGAGCT ACCATGCCCG 
3901 GCTAATTTTT GTATTTTTAG TAGAGATGGG GTTTCATTAT GTTGGCCAGG 
3951 CTGGTCTCGA ACTCCTTACC TCAGGTGATC CACCCACCTT GGCCTCCCGA 
4001 AGTGCTGGGA TTACAGGCAT GAGCCACTGC GCCCGGCCAT GGACCTGGCT 
4051 GTCTTTATCA TCCCCACAAA CATTTTGAAA CTGGAATATT TGTCTTCAGA 
4101 AAATGGAAAC AAGACTATAA ATGATAAGCC CTGTCCCTAG CACCACCTCT 
4151 CCTGTGTGTG GAATAGAGGC CCCTCGTGCT ACCAACACTT ACCCTGTGTT 
4201 TAAAAAGATC TTGTACCAAG CCAACGGCGT TCCTGGCTCT CCTGCCCACA 
4251 GGATGAACAT TTTCGGCTTC CTTAGGAGTT TTGCCCTACC GTATTCCAAA 
4301 GCGTGTGCTG GTTTCTCATA TTGTCTGTAG GCTCACTCAG CCCGCAGTTT 
4351 ATGTGTGTGC TTTTTTCTAT GAAAAATGAT GTATTTTGCT ACTTCCTGTG 
4401 TACAAAGTTT TATTGTAAAT GTTTTTTGTG CTTTGCATGA ACAGGGGCCA 
4451 CGTTGTTGCA ATTGTTTCAG TAGAACTGGT TTGATTTCTA AAATGTTCCT 
4 501 GTAACATATC TTTTATGAAC AAATCTGAAC AATTTGTGAA ATAAAACATT 
4 551 GAAAACCAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS834352 from database EMBL: 
human STS WI-15502. 
Score - 1331, P » 5.4e-54, identities = 287/301 



Medline entries 



98146272: 

cDNA cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 



Peptide information for frame 1 



ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 



1 MSCVLGGVIP LGLLFLVCGS QGYLLPNVTL LEELLSKYQH NESHSRVRRA 

51 IPREDKEEIL MLHNKLRGQV QPQASNMEYM TWDDELEKSA AAWASQCIWE 

101 HGPTSLLVSI GQNLGAHWGR YRSPGFHVQS WYDEVKDYTY PYPSECNPWC 

151 PERCSGPMCT HYTQIVWATT NKIGCAVNTC RKMTVWGEVW ENAVYFVCNY 

201 SPKGNWIGEA PYKNGRPCSE CPPSYGGSCR NNLCYREETY TPKPETDEMN 

251 EVETAPIPEE NHVWLQPRVM RPTKPKKTSA VNYMTQVVRC DTKMKDRCKG 

301 STCNRYQCPA GCLNHKAKIF GTLFYESSSS ICRAAIHYGI LDDKGGLVDI 

351 TRNGKVPFFV KSERHGVQSL SKYKPSSSFM VSKVKVQDLD CYTTVAQLCP 

401 FEKPATHCPR IHCPAHCKDE PSYWAPVFGT NlYADTSSIC KTAVHAGVIS 
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451 NESGGDVDVM PVDKKKTYVG SLRNGVQSES LGTPRDGKAF RIFAVRQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4b4, frame 1 

TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung 
protein 1"; Rattus norvegicus late gestation lung protein 1 (Lgll) 
mRNA, complete cds., N = 1, Score » 968, P « 1.9e-97 

TREMBL: D45027_l product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., N = 1, Score « 738, P = 
4.5e-73 

TREMBL : AB00 9609_1 gene: "HrTT-1 "; Halocynthia roretzi HrTT-1 mRNA, 
complete cds., N « 1, Score - 345, P - 2e-31 

PIR:JC5308 testis-specif ic, vespid, and pathogenesis-related protein 1 
precursor - human, N ■ 1, Score = 337, P =» 1.7e-30 



>TREMBLNEW : AF1 0967 4_1 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds . 

Length =188 

HSPs: 

Score « 968 (145.2 bits), Expect =* 1.9e-97, P *» 1.9e-97 
Identities = 160/185 (86%), Positives ° 170/185 (91%) 

Query: 61 MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 120 

MLHNKLRGQV P ASNMEYMTWD+ELE+SAAAWA +C+WEHGP SLLVSIGQNL HWGR 
Sbjct: 1 MLHNKLRGQVYPPASNMEYMTWDEELERSAAAWAQRCLWEHGPASLLVSIGQNLAVHWGR 60 

Query: 121 YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 180 

YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MC T H YTQ+ VW ATT N K I GC A V + T C 
Sbjct: 61 YRSPGFHVQSWYDEVKDYTYPYPHECNPWCPERCSGAMCTHYTQMVWATTNKIGCAVHTC 120 

Query: 181 RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 240 

R M+VWG++WENAVY VCNYSPKGNWIGEAPYK+GRPCSECP SYGG CRNNLCYREE Y 
Sbjct: 121 RSMSVWGDIWENAVYLVCNYSPKGNWIGEAPYKHGRPCSECPSSYGGGCRNNLCYREEHY 180 

Query: 241 TPKPE 245 
KPE 

Sbjct: 181 HQKPE 185 



Pedant information for DKFZphtes3_4b4, frame 1 



Report for DKFZphtes3_4b4 . 1 



(LENGTH] 497 

IMW] 55920,00 

(pi] 8.36 

(HOMOL] TREMBL: D4 502 7_1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA for 25 
kDa trypsin inhibitor, complete cds. 6e-78 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YJL078C] 8e-12 

[BLOCKS] BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009D Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BLO1009C Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[PIRKWJ glycoprotein 5e-22 

[PIRKW] blocked amino end 5e-13 

[PIRKW] brain 9e-30 

[PIRKW] hydrolase 4e-09 

[PIRKW] hemolymph coagulation 4e-09 

[PIRKW] zymogen 4e-09 

[ PIRKW] alternative splicing 4e-09 

[PIRKW] sperm 5e-22 

[PIRKW] viroid-induced protein 2e-ll 

[PIRKW] venom 6e-18 

[PIRKW] pyroglutamic acid 2e-ll 

[PIRKW] transmembrane protein 2e-10 

[PIRKW] serine proteinase 4e-09 

[SUPFAM] C-type lectin homology 4e-09 

[SUPFAM] trypsin homology 4e-09 
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[SUPFAMJ complement factor H repeat homology 4e-09 

(SUPFAM] cysteine-rich secretory protein 1 6e-24 

[SUPFAM] pathogenesis-related leaf protein 7e-15 

[PROSITE) MYRISTYL 8 

[PROSITE] CAMP_PHOSPHO_SITE 3 

(PROSITE] CK2 PHOSPHO_SITE 6 

[PROSITE] TYR~PHOSPHO_SITE 1 

(PROSITE] PKC_PHOSPHO_SITE 8 

(PROSITE] ASN_GLYCOSYLATION 3 

(PROSITE] SCP_AG5_PR1_SC7_2 1 

[PFAMJ SCP-like extracellular Proteins 

[KW] All_Beta 

IKW] SIGNAL_PEPTIDE 23 

[KW] LOW_COMPLEXITY 1.21 % 

SEQ MSCVLGGVIPLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAIPREDKEEIL 
SEG xxxxxx 



PRD ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 

SEQ MLHNKLRGQVQPQASNME YMTWDDELEKSAAAWASQC IWEHGPTSLLVS I GQNLGAHWGR 

SEG 

PRD hhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

SEQ YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 

SEG 

PRD ccccchhhhhhhhhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

SEQ RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 

SEG 

PRD cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

SEQ TPKPETDEMNEVETAPI PEENHVWLQPRVMRPTKPKKTSAVKYMTQVVRCDTKMKDRCKG 

SEG 

PRD cccccccccccccccccccceeeeecccccccccccceeeeeeeeeeeeecccccccccc 

SEQ STCNRYQCPAGCLNHKAKIFGTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

SEG 

PRD ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

SEQ KSERHGVQSLSKYKPSSSFMVSKVKVQDLDCYTTVAQLCPFEKPATHCPRIHCPAHCKDE 

SEG 

PRD eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

SEQ P S Y WA P V FGTN I Y ADT S S I C KT AVHAG V I S N ESGGD V D VM P V DKK KT Y VG S L RNG VQS ES 

SEG 

PRD ccceeeeeceeeccccceeeeeeeeccccccccccccceeecccceeeeeecccceeeee 

SEQ LGTPRDGKAFRI FAVRQ 

SEG 

PRD ccccccccceeeeeccc 



Prosite for DKFZphtes3_4b4 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 



27->31 
41->45 
451->455 
181->185 
276->280 
464->468 
170->173 
179->182 
201->204 
228->231 
241->244 
362->365 
471->474 
483->486 
29->33 
75->79 

ai->85 

130->134 
453->457 
483->487 
385->393 
111->117 
115->121 
174->180 
204->210 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I T E 

PKC PHOSPHO_SITE 

PKC"PHOS PHO_S I T E 

PKC_PHOSPHO_SITE 

CK2_PHOS PHO_S I T E 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS01010 



227->233 
300->306 
447->453 
470->476 
195->207 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 

SCP AG 5 PR1 SC7 2 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00772 



Pfara for DKFZphtes3_4b4 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



SCP-like extracellular Proteins 

♦PQDEQDEWLNkHNDFRQQVGRGLETRGNPGPQPPAsNMnPMVWNDELAt 
P + ++E+L HN +R QV P ASNM M+W+DEL + 

52 PREDKEEILMLHNKLRGQVQ PQASNMEYMTWDDELEK 



NEvkDYNYNWNTCkGG NNFmVCGH YTQMVWRnT f r IGCGRYI C YC 

+EVKDY Y + + +C HYTQ+VW+ T +IGC+ C+ 

133 DEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTCRK 



NNNWrKPDPWKhkWYYVCNYCPpGNYmN* 
+ W + W+ +Y VCNY P+GN+++ 
183 MTVW — GEVWENAVYFVCNYSPKGNWIG 



88 



I AQnW ANQC iFDHHDCCWNHsnYP YGQN I AWW S S T AN n P WnW s S M I QMW Y 
A WA+QCI +H ++ + S GQN+ + + ++++ +Q+WY 
89 SAAAWASQCIWEHGPTSLLVSI— GQNLGAHWG RYRS PGFHVQSWY 132 



182 



208 
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DKFZphtes3_4f 17 



group: testes derived 

DKFZphtes3_4f 17 encodes a novel 656 amino acid protein with weak similarity to methyl-CpG- 
binding proteins. 

Methylation at the DNA sequence 5'-CpG is required for mammalian development. Methyl-CpG- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does not contain such a motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to methyl-CpG-binding protein 

extension of HS557771/HSZ78337, 

there are some differences to these sequences 

Sequenced by AGOWA 

Locus: /map^lQ" 

Insert length: 2320 bp 

Poly A stretch at pos . 2266, polyadenylation signal at pos. 2251 



1 GGCAGGTTCG CGGGTCGCTG 
51 GGAGATATGG AGGGAGATGG 
101 GGACAGCAAG TCCGAGAATG 
151 GCAAACCGGA CATCAACTGC 
201 TGGTTCCATG GGGACTGCAT 
251 CCGGGAGTGG TACTGTCGGG 
301 TTCGCTATCG GCACAAGAAG 
351 AGCAGTGAGC CCCGGGATGA 
401 TCCAGACCTG CAGCGCCGGG 
451 TTGCTCGGGG CTCTGCTTCG 
501 GCCACACCCA GCCAGCATCA 
551 AGCCCGCATG TGTGGTGAGT 
601 GTCACTGTGA TTTCTGTCGG 
651 ATCCGGCAGA AGTGCCGGCT 
701 GTACAAGTAC TTCCCTTCCT 
751 TGCCAAGGCC CCGCCGGCCA 
801 CAGAAGTTAG GGCGCATCCG 
851 AGTCAAGGAG CCTCCTGAGG 
901 AGGACCTACC TCTGGATCCT 
951 TTTGATGACC ATGGCCTGCC 
1001 CCTGGACCCC GCGCTGCGGA 
1051 GTCGGGAGAA GAAGTCTGAG 
1101 CGGCAGAAGC AGAAGCACAA 
1151 TGCCAAGGAC CCTGCGTCAC 
1201 GCCCCGCCCA GCCCAGCTCC 
1251 CTGGCAGCCA ACCGCATCTA 
1301 GCAGCAGAGC CCTTGCATTG 
1351 GCATTCGCCG AGAGCAGCAG 
1401 CGCCGATTCC ATGAGCTTGA 
1451 TGTGCGCGAG GATGAGGAGA 
1501 TGCAGATCTT CTGTGTTTCC 
1551 TTGCGCCACA TGGAGCGCTG 
1601 TGGGTCCATG TACCCCACAC 
1651 ATGTGTATAA TCCTCAGAGC 
1701 TGCCCCGAGC ACTCACGGGA 
1751 GTGCCCCCTT GTACGTGATG 
1801 TGCCCAAGCG CCAGTGCAAT 
1851 GCGGAAGTGG ACTTGGAGCG 
1901 GTTTGAGCAG GAGCGCAATG 
1951 TGCTGGCCCT GATGCTGCAC 
2001 GACCTGCGCT CCAGTGCCGA 
2051 ACCCTGCATT CCAGATGGGG 
2101 CCACTCATCT GTTTCTCCGG 
2151 CCCATCTGCC TTTATCAGAG 
2201 GGTGGGGCTG CGGAGTCCAC 
2251 TAATAAAATT TTGAAGAAAC 
2301 AAAAAAAAAA AAAAAAAAAA 



GCGGGGGTCG TGAGGGAGTG CGCCGGGAGC 
TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
GGGAGAATGC GCCCATCTAC TGCATCTGCC 
TTCATGATCG GGTGTGACAA CTGCAATGAG 
CCGGATCACT GAGAAGATGG CCAAGGCCAT 
AGTGCAGAGA GAAAGACCCC AAGCTAGAGA 
TCACGGGAGC GGGATGGCAA TGAGCGGGAC 
GGGTGGAGGG CGCAAGAGGC CTGTCCCTGA 
CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
CCAGCAGCAG CAGCAGCAGA TCAAACGGTC 
GTGAGGCATG TCGGCGCACT GAGGACTGTG 
GACATGAAGA AGTTCGGGGG CCCCAACAAG 
GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
CTGCCCACCC AACAGCAGCC ACAGCCATCA 
TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
GACCTGTATC AGGACTTCTG TGCAGGGGCC 
CTGGATGAGC GACACAGAAG AGTCCCCATT 
AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
GGATAAATGG AAACACCCAG AGAGGGCTGA 
TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
AAGTATTGCT CAGATGACTG TGGCATGAAG 
CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 
AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
GGCCATCATT CTACGTGCCA AGCAGCAGGC 
GCAACGAGGG TGACAGTGAT GACACAGACC 
TGTGGGCACC CCATCAACCC ACGTGTTGCC 
CTACGCCAAG TATGAGAGCC AGACGTCCTT 
GCATTGAAGG GGCCACACGA CTCTTCTGTG 
AAAACATACT GTAAGCGGCT CCAGGTGCTG 
CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
TCTTTGAGCT CACGGGTGAC TTCTGCCGCC 
CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
TGCGCACAGC CATGACAAAC CGCGCGGGAT 
CAGACGATCC AGCACGATCC CCTCACTACC 
CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
GGACTGTCCC CGTCGACATG TTCAGTGCCT 
TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HS5S7771 from database EMBLEST: 

Human chromosome 18 clone 2 mRNA sequence. 

Score « 7582, P = 0.0e+00, identities = 1560/1598 

Entry HSZ7B337 from database EMBLEST: 

H. sapiens mRNA, expressed sequence tag ICRFp507H02l94 (5*) 
Score = 6339, P - 9.0e-281, identities = 1307/1347 

Entry HS095149 from database EMBL: 
human STS WI-6941. 
Score = 1210, P = 2.2e-49, identities = 246/251 



Medline entries 



98449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins. 

9824997: 

Gene silencing by methyl-CpG-binding proteins. 



Peptide information for frame 3 



ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 



1 MEGDGSDPEP PDAGEDSKSE 
51 HGDCIRITEK MAKAIREWYC 
101 EPRDEGGGRK RPVPDPDLQR 
151 PSQHHQQQQQ QIKRSARMCG 
201 QKCRLRQCQL RARESYKYFP 
251 LGRIREDEGA VASSTVKEPP 
301 DHGLPWMSDT EESPFLDPAL 
351 KQKHKDKWKH PERADAKDPA 
401 ANRIYEILPQ RIQQWQQSPC 
4 51 FHELEAIILR AKQQAVREDE 
501 HMERCYAKYB SQTSFGSMYP 
551 EHSRDPKVPA DEVCGCPLVR 
601 VDLERVRVWY KLDELFEQER 
651 RSSADR 



NGENAPIYCI CRKPDINCFM IGCDNCNEWF 
RECREKDPKL EIRYRHKKSR ERDGNERDSS 
RAGS GTG VGA MLARGSASPH KSSPQPLVAT 
ECEACRRTED CGHCDFCRDM KKFGGPNKIR 
SSLSPVTPSE SLPRPRRPLP TQQQPQPSQK 
EATATPEPLS DEDLPLDPDL YQDFCAGAFD 
RKRAVKVKHV KRREKKSEKK KEERYKRHRQ 
SLPQCLGPGC VRPAQPSSKY CSDDCGMKLA 
IAEEHGKKLL ERIRREQQSA RTRLQEMERR 
ESNEGDSDDT DLQIFCVSCG HPINPRVALR 
TRIEGATRLF CDVYNPQSKT YCKRLQVLCP 
DVFELTGDFC RLPKRQCNRH YCWEKLRRAE 
NVRTAMTNRA GLLALMLHQT IQHDPLTTDL 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4f 17 , frame 3 

TREMBL : CEF52B1 1_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid 
F52B11, N - 2, Score =» 316, P = 8.8e-27 

TREMBL : HSAB233 1_1 gene: n KlPA0333 n t Human mRNA for KIAA0333 gene, 
partial cd3., N « 2, Score * 163, P = 2.8e-13 

TREMBL :SPCC5 9 4_5 gene: "SPCC594 .05c w ; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.pombe 
chromosome III cosmid c594., N = 3, Score * 168, P = 3.6e-12 

TREMBL: AF072240_1 gene: "Mbdl"; product: "methyl-CpG binding protein 
MBD1 W ; Mus musculus methyl-CpG binding protein MBD1 (Mbdl) mRNA, 
complete cds., N = 2, Score = 189, P = 7.6e-U 



>TREMBL : CEF52B1 1 4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52B11 
Length =523 

HSPs: 

Score = 316 {47.4 bits), Expect « 8.8e-27, Sum P(2) » 8.8e-27 
Identities - 100/336 (29%), Positives = 167/336 (49%) 
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Query: 333 REKKSEKKKEERYKRHRQ-KQKHKDKWKHPERADAKDPASLP-QCLGPGCVRPAQPSSKY 390 

+++K+ E Y +R +Q+ D + + +A +P P QCL P C+ ++ SKY 
Sbjct: 118 QQRKANIINERDYVPNRPTRQQSADLRRKRTQLNA-EPDKHPRQCLNPNCIYESRIDSKY 176 

Query: 391 CSDDCGMKLAANRI YEILPQRIQQW QQSPCIAEEHGKKLLERIRREQQSARTRLQ 445 

CSD+CG +LA R+ EI LP R +Q+ P E+ K +1 RE Q + 

Sbjct: 177 CSDECGKELARMRLTEILPNRCKQYFFEGPSGGPRSLEDEIKPKRAKINREVQKLTESEK 236 

Query: 446 EMERRFHEL-EAIILRAKQQAVREDEESNEGDSDDTDLQIFCVSCGHPINPRVAL-RHME 503 

M ++L EI + K Q + +E D +L C+ CG P P + +H+E 

SbjCt: 237 NMMAFLNKLVEFI KTQLKLQPLGTEERY DDNLYEGCI VCGLPDI PLLKYTKHI E 290 

Query: 504 RCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKTYCKRLQVLCPEHSRDPKVPADEV 563 

C+A+ E SFG+ P + +C+ Y+ ++ ++CKRL+ LCPEH + +V 

SbjCt: 291 LCWARSEKAISFGA — PEK — NNDMFYCEKYDSRTNSFCKRLKSLCPEHRKLGDEQHLKV 346 

Query: 564 CGCP LVRDVFELTGDF CRLPKRQCNRHYCWEKLRRAEVDLERVR 607 

CG P V ++ E+ F CR K C++H+ W R ++LE+ 

SbjCt: 347 CGYPKKWEDGMIETAKTVSELIEMEDPFGEEGCRTKKDACHKHHKWIPSLRGTIELEQAC 406 

Query: 608 VWYKLDELFEQ— ERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSA 654 

++ K+ EL + + N T A L++M+H+ + + LR+ A 

Sbjct: 407 L FQKM YELC H EMH KLN AHAEWTT N A- -LSI MMH KQ PS T E KC S FFL RN FA 453 

Score = 53 (8.0 bits), Expect - 8.8e-27, Sura P(2) - 8.8e-27 
Identities = 24/100 (24%) , Positives - 41/100 (41%) 

Query: 169 CGECEACRRTEDCGHCDFCR DMKK-FGGPNKIRQKCRLRQCQLRARESYKYFPSS 222 

C C C ++CG C CR DM+K F +K + RQ + + + 

Sbjct: 17 CMNCIRCNDEKNCGTCWPCRNGKTCDMRKCFSAKRLYNEKVK-RQTDENLK-AIMAKTAQ 74 

Query: 223 LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 264 

+ + P P+ +QQ + +K GR + G A++ 
Sbjct: 75 REAAHQAATTTAPSAPVVIEQQVE-KKKRGRKKGSGNGGAAAA 116 

Score = 48 (7.2 bits), Expect - 2.9e-26, Sum P(2) « 2.9e-26 
Identities - 13/39 (33%), Positives = 19/39 (48%) 

Query: 179 EDCGHCDFCRDMKKFGG— PNKIRQKCRLRQCQLRARESY 216 

EC+CCDKG P++C +R+C A+ Y 
SbjCt: 15 ERCMNC I RCNDEKNCGTCWPCRNGKTCDMRKC- FSAKRLY 53 



Pedant information for DKFZphtes3_4f 17, frame 3 



Report for DKFZphtes3_4f 17 . 3 



[LENGTH] 656 

[MW] 75711.71 

[pi] 8.61 

[HOMOL] TREMBL :CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52B11 3e 

[ FUNCAT ) 99 unclassified proteins [S. cerevisiae, YPL138c] 3e-10 

[FUNCATJ 04.05.01.04 transcriptional control [S. cerevisiae, YNL097c] 2e-04 

[PROSITE) MYRISTYL 6 

[PROSITEJ AMIDATION 2 

[PROSITE] CK2_PHOSPHO_SITE 8 

[PROSITEJ TYR_PHOSPHO_SITE 3 

[PROSITE) GLYCOSAMINOGLYCAN 1 

[PROSITEJ PKC_PHOSPHO SITE 9 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 18.75 % 

[KW] COILED_COIL 4 . 57 % 



SEQ MEGDGSDPEPPDAGEDSKSENGENAPIYCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 

SEG 

PRD cccccccccccccccccccccccccceeeeeeccccceeeeecccccccccccchhhhhh 

COILS 

SEQ MAKAIREWYCRECREKDPKLEIRYRHKKSRERDGNERDSSEPRDEGGGRKRPVPDPDLQR 

SEG 

PRD hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

COILS 

SEQ RAGSGTGVGAMLARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARMCGECEACRRTED 

SEG xxxxxxxxx 

PRD cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 

COILS 
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SEQ CGHCDFCRDMKKFGGPNKIRQKCRLRQCQLRARESYKYFPSSLSPVTPSESLPRPRRPLP 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRO cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 

COILS 

SEQ TQQQPQPSQKLGRIREDEGAVASSTVKEPPEATATPEPLSDEDLPLDPDLYQDFCAGAFD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ DHGLPWMSDTEESPFLDPALRKRAVKVKHVKRREKKSEKKKEERYKRHRQKQKHKDKWKH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchh 

COILS 

SEQ PERADAKDPASLPQCLGPGCVRPAQPSSKYCSDDCGMKLAANRIYEILPQRIQQWQQSPC 

SEG 

PRD hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 

COILS 

SEQ I AEEHGKKLLERIRREQQSARTRLQEMERRFHELEAI I LRAKQQAVREDEESNEGDSDDT 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DLQIFCVSCGHPINPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 

SEG X 

PRD ceeeeeeeccccccccchhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccc 

COILS 

SEQ YCKRLQVLCPEHSRDPKVPADEVCGCPLVRDVFELTGDFCRLPKRQCNRHYCWEKLRRAE 

SEG 

PRD cchhhhhhhccccccccccceeeeccccchhhhhccccccccccccccchhhhhhhhhhh 

COILS 

SEQ VDLERVRVWYKLDELFEQERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSADR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 

COILS 



Prosite for DKFZphtes3_4f 17 . 3 



PS00002 


124->128 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


58->61 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


165->168 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


215->218 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


248->251 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


265->268 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


337->340 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


387->390 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


439->442 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


627->630 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


6->10 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


17->21 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


227->231 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


265->269 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


280->284 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


308->312 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


521->525 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


652->656 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00007 


339->346 


TYR PHOSPHO 


site 


PDOC00007 
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TYR PHOSPHO* 


"site 


PDOC00007 
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TYR~PHOSPHO~ 


"site 


PDOC00007 
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MYRISTYL 
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MYRISTYL 




PDOC00008 
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MYRISTYL 




PDOC00008 


PS00008 
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PDOC00008 


PS00009 


107->111 


AM I DAT I ON 




PDOC00009 


PS00009 


425->429 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphtes3_4fl7 .3) 
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DKFZphtes3_4fS 

group: signal transduction 

DKFZphtes3_4f5.3encodes a novel 790 amino acid protein similar to beta-transducins . 

The protein contains 3 WD-40 repeats, which are typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding site signature is present. The protein is larger (790 amino acids) than the usual 
eukaryotic G-beta transducins (about 340 amino acids) . 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 

similarity to S.pombe "beta-transducin" 

complete cDNA, EST hits 
complete cds, 

on genomic level encoded by HS313D11, at least 7 exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /map="16pl3 . 3" 
Insert length: 3166 bp 

No poly A stretch found, no polyadenylation signal found 

1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 
51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CTGGTTCTCG 

101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC AGGAACCCTG 

151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 

201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 

251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 

301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 

351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 

401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 

451 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 

501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 

551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 

601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG GGACCAAGCA 

651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG ACCCACCCAG 

701 GCTGACCAGG CCAGCCCACC TCACTGACCT CCTGACCCCT GACCTCATCA 

751 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG TGACCACAGC CCTGGGTGGC 

801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATG CTCCCGCCAA 

851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 

901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 

951 AACCTGCGTG TGGGGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 
1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 
1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 
1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 
1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 
1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 
1251 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 
1301 CTCCACCTTT GAGAACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 
1351 ACCGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 
1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 
1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG GAGATGCACT 
1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 
1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 
1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT GAGGAACACC 
1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC ACCCCCACGA CCCCTCCTTC 
1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 
1751 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 
1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT GGCTGCCGAG 
1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 
1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 
1951 TCAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCG CTGGTTTGTG 
2001 GACACAGCTG AGCGTTATGC GCTGGCTGGC CGGCCACTGG CCGAGCTCTG 
2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 
2101 AAACGTGGAC CATGCTGCGG ATCATCTACT GCAGCCCTGG CCTAGTGCCC 
2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 
2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 
2251 AGACGCGGCT GGACCGCAGC AAAGGAGATG CACGGAGCGA CACAGTTCTG 
2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AGGAAACCGA 
2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 



880 



WO 01/12659 



PCT/IB00/01496 



2401 AGGACGAGCT GTACCTGCTG GATCCGGAAC 

2451 GAGTGCGTGC TGCCGCAGGA GGCCTTTCCG 

2501 CACGCCTCCC GGACCCGAGC ACCTGCAGGA 

2551 TGAGCGGCAG CGAGGCGGAT GTGGCCTCCC 

2601 TTCTCGCTCC TGTCTGTCTC ACACGCGCTC 

2651 CGACTTCTTC GGCGTGCTGG TGCGCGACAT 

2701 AGGGCGACGT GCAGATGGCT GTGTCTGTGC 

2751 GTGCGCAAGG ACATCGACGA GCAGACCCAG 

2801 CATCGACCTG CTGCAGCGCT TCCGCCTCTG 

2851 TCAAGCTGAG CACCAGCCGC GCCGTCAGCT 

2901 ACCCTGCACG TCAACTGCAG CCACTGCAAG 

2951 CTGGGTCTGC GACAGGTGCC ACCGCTGCGC 

3001 ACCACGTAGT CAAGGGTCTC TTCGTGTGGT 

3051 GGCCACCTGC AGCACATCAT GAAGTGGCTG 

3101 CGCAGGCTGC GGCCACCTCT GCGAGTACTC 

3151 CTTGCCCGGG CGGCCG 



BLAST Results 



Entry HS313D11 from database EMBL : 
Human DNA sequence from cosmid 313D11 from a contig on the short arm of 
chromosome 16. Contains ESTs, STS and CpG islands. 
Score « 6238, P = 0.0e+00, identities = 1318/1391 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 



1 MEKMSRVTTA LGGSVLTGRT MHCHLDAPAN AISVCRDAAQ VWAGRSIFK 
51 IYAIEEEQFV EKLNLRVGRK PSLNLSCADV VWHQMDENLL ATAATNGVVV 
101 TWNLGRPSRN KQDQLFTEHK RTVNKVCFHP TEAHVLLSGS QDGFMKCFDL 
151 RRKDSVSTFS GQSESVRDVQ FSIRDYFTFA STFENGNVQL WDIRRPDRCE 
201 RMFTAHNGPV FCCDWHPEDR GWLATGGRDK MVKVWDMTTH RAKEMHCVQT 
251 IASVARVKWR PECRHHLATC SMMVDHNIYV WDVRRPFVPA AMFEEHRDVT 
301 TGIAWRH PHD PSFLLSGSKD SSLCQHLFRD ASQPVERANP EGLCYGLFGD 
351 LAFAAKESLV AAESGRKPYT GDRRHPIFFK RKLDPAEPFA GLASSALSVF 
401 ETEPGGGGMR WFVDTAERYA LAGRPLAELC DHNAKVAREL GRNQVAQTWT 
451 MLRIIYCSPG LVPTANLNHS VGKGGSCGLP LMNSFNLKDM APGLGSETRL 
501 DRSKGDARSD TVLLDSSATL ITNEDNEETE GSDVPADYLL GDVEGEEDEL 
551 YLLDPEHAHP EDPECVLPQE AFPLRHEIVD TPPGPEHLQD KADSPHVSGS 
601 EADVASLAPV DSSFSLLSVS HALYDSRLPP DFFGVLVRDM LHFYAEQGDV 
651 QMAVSVLIVL GERVRKDIDE QTQEHWYTSY I DLLQRFRLW NVSNEVVKLS 
701 TSRAVSCLNQ ASTTLHVNCS HCKRPMSSRG WVCDRCHRCA SMCAVCHHW 
751 KGLFVWCQGC SHGGHLQHIM KWLEGSSHCP AGCGHLCEYS 

BLASTP hits 

Entry YDSB_SCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN C4F8.11 IN 
CHROMOSOME I. >TREMBL : SPAC4 F8_l 1 gene: "SPAC4F8 . 11" ; product: 
"beta-transducin"; S.pombe chromosome I cosmid c4F8. 
Score = 404, P = 3.0e-42, identities => 169/639, positives ~ 278/639 

Entry PEX7_HUMAN from database SWISSPROT : 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) . 
>TREMBL:HSU76560_1 gene: "Pex7"; product: "peroxisome targeting signal 
2 receptor"; Human peroxisome targeting signal 2 receptor (Pex7) mRNA, 
complete cds . >TREMBL:HSU88871_1 gene: M HsPEX7"; product: "HsPex7p"; 
Human HsPex7p (HsPEX7) mRNA, complete cds. 

Score = 220, P =» l.le-15, identities * 62/244, positives * 107/244 
Entry PEX7_MOUSE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7). 
>TREMBL:MMU69171_1 product: "peroxisomal PTS2 receptor"; Mus musculus 
peroxisomal PTS2 receptor mRNA, complete cds. 

Score = 214, p = 5.3e-15, identities =• 60/240, positives = 106/240 
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ACGCGCACCC 
CTGCGCCACG 
CAAGGCCGAC 
TGGCCCCCGT 
TACGACAGCC 
GCTGCACTTC 
TCATCGTCCT 
GAGCACTGGT 
GAACGTGTCC 
GCCTCAACCA 
CGGCCCATGA 
CAGCATGTGT 
GCCAGGGCTG 
GAAGGCAGCT 
CTGACGGGGC 



CGAGGACCCT 
AGATCGTGGA 
TCCCCGCACG 
GGACTCCTCC 
GCCTGCCGCC 
TACGCTGAGC 
GGGTGAACGG 
ACACTTCCTA 
AACGAGGTGG 
GGCCTCCACC 
GCAGCCGGGG 
GCCGTCTGCC 
CAGCCACGGC 
CCCACTGTCC 
ATCTGCTGGG 
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Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17.7"; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score - 232, P = 3.4e-14, identities = 68/260, positives = 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138c - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCYOL138C_l S. cerevisiae chromosome XV reading frame ORF 
YOL138C 

Score = 136, P - 2.5e-13, identities » 24/77, positives =• 44/77 



Alert BLAST P hits for DKFZphtes3_4f 5, frame 3 
No Alert BLAST P hits found 

Pedant information for DKFZphtes3_4f 5, frame 3 

Report for DKFZphtes3_4f 5 . 3 



LENGTH } 790 

MW] 88207.10 

pi] 6.05 

HOMOL] SWISSPROT:YDSB_SCHP0 HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 

C4F8.11 IN CHROMOSOME I. 9e-4 4 



FUN CAT ] 
FUN CAT] 
FUN CAT ] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 

3e-10 
FUNCAT] 

TAF90 - 



99 unclassified proteins [S. cerevisiae, YOL138c] 5e-16 

10.04.09 regulation of g-protein activity [S. cerevisiae, YBR195c] 3e-ll 

06.10 assembly of protein complexes [S. cerevisiae, YBR195c) 3e-ll 

03.16 dna synthesis and replication [S. cerevisiae, YBR195C] 3e-ll 

09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 3e-ll 

04.05.01.07 chromatin modification [S. cerevisiae, YBRl95c] 3e-ll 

30.10 nuclear organization [S. cerevisiae, YCR072c beta-transducin family] 



04.05.01.01 general transcription activities 
TFIID subunit] 9e-09 



[S. cerevisiae, YBR198c 



04.01.04 rrna processing (S. cerevisiae, YLLOllw] le-07 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL195w] 



30.19 peroxisomal organization [S. cerevisiae, YDR142c] 4e-07 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142c] 

08.10 peroxisomal transport (S. cerevisiae, YDRl42cJ 4e-07 
08.01 nuclear transport (S. cerevisiae, YER107c) 4e-07 

04.07 rna transport [S. cerevisiae, YERl07c] 4e-07 

30.03 organization of cytoplasm [S. cerevisiae, YER107c] 4e-07 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL003c] 5e-07 
06.13 proteolysis [S. cerevisiae, YGL003c] 5e-07 
04.05.01.04 transcriptional control [S. cerevisiae, YCR084c] 8e-07 
04.05.03 rarna processing (splicing) [S. cerevisiae, YPR178w] le-06 
03.13 meiosis [S. cerevisiae, YLRl29w] 3e-06 
03.25 cytokinesis [S. cerevisiae, YCR057c] le-05 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YCR057c] 



FUNCAT] 
FUNCAT] 
YDL195w] 2e-07 

FUNCAT] 
2e-07 
FUNCAT] 
FUNCAT] 
4e-07 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT] 
FUNCAT ] 
FUNCAT J 
FUNCAT ] 
FUNCAT ] 
FUNCAT ] 
e-05 

FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YEL056w] 2e-04 

FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YOR272w] 6e-04 

SCOP] dlgotb_ 2.46.3.1.1 betal-subunit of the signal-transducing 5e-06 

PIRKW] duplication 7e-10 

PIRKW] signal transduction 7e-08 

PIRKW] peroxisome 9e-06 

PIRKW] heterotrimer 7e-08 

PIRKW] GTP binding 7e-08 

PIRKW] peroxisome biogenesis 9e-06 

PIRKW] transmembrane protein le-14 

SUPFAM] MSI1 protein 7e-10 

SUPFAM] WD repeat homology le-14 

SUPFAM] GTP-binding regulatory protein beta chain 7e-08 

SUPFAM] PRL1 protein 3e-08 

SUPFAM] coatomer complex beta' chain le-06 

PROSITE] CYTOCHROME_C 1 

PROSITE] WD__REP£ATS 3 

PROSITE] MYRISTYL 10 

PROSITE] AMI DAT ION 2 

PROSITE] CAMP PHOSPHO_SITE 2 

PROSITE] CK2 PHOSPHO SITE 11 
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[PROSITE] TYR PHOSPHO_SITE 1 

(PROSITE] PKC~PHOSPHO_SITE 7 

(PROSITE1 ASN_GLYCOSYLATION 4 

{PFAM] WD domain, G-beta repeats 

[KW] All Beta 

(KW] 3D " 

[KW] LOW_COMPLEXITY 2.28 % 

SEQ MEKMSRVTTALGGSVLTGRTMHCHLDAPANAISVCRDAAQVVVAGRSIFKIYAIEEEQFV 

SEG 

IgotB 

SEQ EKLNLRVGRKPSLNLSCADVVWHQMDENLLATAATNGVWTWNLGRPSRNKQDQLFTEHK 

SEG 

IgotB TTCEEEEEETTTEEEEEET-TTTCEEE--EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVQFSIRDYFTFA 

SEG 

IgotB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTTEEEEEECBTTCCEEEEEETTTTTEEEE 

SEQ ST FENGNVQLW D I RRP DRC E RM FT AH NG P V FC C DWH PE DRG WL AT GGRDKMV K VWDMTT H 

SEG 

IgotB E-ETTTEEEEEETTTTEEEE-EEECCCCCEEEEEE-TTTTCCEEEEETTTEEEEEC 

SEQ RAK EMH CVQT I AS VARVKWR P EC RHH L AT C S MMV DHN IYVWDVRRPFV P AAM F E EHRDVT 

SEG 

IgotB 

SEQ TGIAWRHPHDPSFLLSGSKDSSLCQHLFRDASQPVERANPEGLCYGLFGDLAFAAKESLV 

SEG 

IgotB 

SEQ AAESGRKPYTGDRRHPIFFKRKLDPAEPFAGLASSALSVFETEPGGGGMRWFVDTAERYA 

SEG 

IgotB 

SEQ LAGRPLAELCDHNAKVARELGRNQVAQTWTMLRIIYCSPGLVPTANLNHSVGKGGSCGLP 

SEG 

IgotB 

SEQ LMNSFNLKDMAPGLGSETRLDRSKGDARSDTVLLDSSATLITNEDNEETEGSDVPADYLL 

SEG xxxx 

IgotB 

SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPQEAFPLRHEIVDTPPGPEHLQDKADSPHVSGS 

SEG xxxxxxxxxxxxxx 

IgotB 

SEQ EADVASLAPVDSSFSLLSVSHALYDSRLPPDFFGVLVRDMLHFYAEQGDVQMAVSVLIVL 

SEG 

IgotB 

SEQ GERVRKDIDEQTQEHWYTSYIDLLQRFRLWNVSNEWKLSTSRAVSCLNQASTTLHVNCS 

SEG 

IgotB 

SEQ HCKRPMSSRGWVCDRCHRCASMCAVCHHVVKGLFVWCQGCSHGGHLQHIMKWLEGSSHCP 

SEG 

IgotB 

SEQ AGCGHLCEYS 

SEG 

IgotB 



Prosite for DKFZphtes3_4f 5 . 3 



PS00001 


74->78 


PS00001 


468->472 


PS00001 


691->695 


PS00001 


718->722 


PS00004 


69->73 


PS00004 


152->156 


PS00005 


17->20 


PS00005 


165->168 


PS00005 


172->175 


PS00005 


239->242 


PS00005 


364->367 


PS00005 


701->704 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


727->730 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


76->80 


CK2 PHOSPHO SITE 


PDOC0O006 


PS0OOO6 


165->169 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


172->176 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


181->185 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


398->402 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


498->502 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


503->507 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


522->526 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


598->602 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


600->604 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


679->6fl3 


CK2 PHOSPHO~SITE 


PDOC00006 




337->346 


TYR PHOSPHO 


.site 


PDOC00007 


DC00008 


13->19 


MYRISTYL 




PDOC00008 


pcnnfJOft 


97->103 


MYRISTYL 




pnorOOOOfl 

JT LJ\s\* \J \J \J \J KJ 


pennOOR 


139->145 


MYRISTYL 




t UVV( \J\J W O 


PSOGOOfi 


161->167 


MYRISTYL 




PDOC00008 


w w \I\J Q 


317->323 


MYRISTYL 




pnorOOOOB 


PS00008 


342->348 


MYRISTYL 




PDOC00008 


PS00008 


391->397 


MYRISTYL 




PDOC00008 


PS00008 


460->466 


MYRISTYL 




PDOC00008 


PS00008 


474->480 


MYRISTYL 




PDOC00008 


PS00008 


759->765 


MYRISTYL 




PDOC00008 


PS00009 


67->71 


AMI DAT I ON 




PDOC00009 


PS00009 


364->368 


AMI DAT ION 




PDOC00009 


PS00190 


743->749 


CYTOCHROME C 


PDOC00169 


PS00678 


90->105 


WD REPEATS 




PDOC00574 


PS00678 


223->238 


WD REPEATS 




PDOC00574 


PS00678 


269->284 


WD REPEATS 




PDOC00574 



Pfam for DKFZphtes3_4 f 5 . 3 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++ HN++V C+ ++P+ R +++G++D+ +++WD 
Query 203 FT AHN G PV E*CC DWH PE DRGWL ATGGRDKMV K VW D 236 
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DKFZphtes3_4h6 



group: intracellular transport/trafficking 

DKFZphtes3_4h6 encodes a novel 622 amino acid protein with strong similarity to the kinesin 
light chain. 

Kinesin is a microtubule -based motor protein that pulls vesicles or organelles towards the 
plus end of microtubules. Structural changes in the protein that drive motility are coupled to 
ATP binding and hydrolysis. The novel protein is similar to kinesin light chain, which is part 
of the functional kinesin holoenzyrae tetrameric protein. The light chain has been proposed to 
function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
of the heavy chain. The novel protein contains two kinesin light chain repeats and one RGD 
cell-attachment site. 

The novel kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 



strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2992 bp 

Poly A stretch at pos. 2914, polyadenylation signal at pos. 2893 



1 GGCGGGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 
51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 
101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 
151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 
201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 
251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 
301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 
351 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 
401 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GGCGCAGGTG CGGCGTCTGG 
451 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCGAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
601 ACGAGGAGAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTGTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
701 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CG AC ACT AAA 
1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 
1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG AGAAGGTCCT GGGCAAGTTT 
1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 
1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 
1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 
1251 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 
1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC TCATGAGAAA GAGTTTGGCT 
1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 
1401 GAAAGCAAGG ATAAGCGCCG GGACAGCGCC CCCTATGGGG AATACGGCAG 
1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 
1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 
1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 
1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 
1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 
1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 
1751 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 
1801 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 
1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 
1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 
1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 
2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 
2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 
2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 
2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 
2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 
2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 
2301 CCTCCCCAGA CCCCAGAGCC AAGAACACTA AGCACTCGCC GGCCCTTCGG 
2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 
2401 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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24 51 GGCCTCCCCT CGTCCCTCTT 

2501 CCATTCCGTG CGGTGGTATC 

2551 CTCCCAAGGG GGTCCTGGGA 

2601 ATGCGTCCTA CCCGCGCCAT 

2651 TTAGTCCGTC CTCCCACCGC 

2701 ACTGCCCCTC CCACCCGGCC 

2751 CACCGCCCAC CGAGCCATCC 

2801 TCGCGAGGGG CGGCGACGGT 

2851 GCGGGTGAGG CGGCTGCTCT 

2901 ACGGTTTAAA TCTGAAAAAA 

2951 AAAAAAAAAA AAAAAAAAAA 



CTAGTGGTAC CGCCCAGGCC TTAATCACCC 
TCCCAGGCTC TACATTCTCG GGAGCGGCGC 
CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 
CGCCCCGTGG CCCAGGACGG GGACCTCCCC 
CGGGCCCTGC CCCGCATCCC GGCCTTATGC 
CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 
TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 
CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 
CTATTTTCAG ATGTTGCTGT AGAAATAAAG 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



98288268: 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 



Peptide information for frame 3 



ORF from 144 bp to 2009 bp; peptide length: 622 
Category: strong similarity to known protein 
Prosite motifs: RGD (502-505) 
KIN£SIN_LIGHT (223-265) 
KINES INFLIGHT (265-307) 



1 MAMMVFPREE KLSQDEIVLG TKAVIQGLET LRGEHRALLA PLVAPEAGEA 
51 EPGSQERCIL LRRSLEAIEL GLGEAQVILA LSSHLGAVES EKQKLRAQVR 
101 RLVQENQWLR EELAGTQQKL QRSEQAVAQL EEEKQHLLFM SQIRKLDEDA 
151 SPKEEKGDVP KDTLDDLFPN EDEQSPAPSP GGGDVSGQHG GYEIPARLRT 
201 LHNLVIQYAS QGRYEVAVPL CKQALEDLEK TSGHDHPDVA TMLNILALVY 
251 RDQNKYKEAA HLLN DALAI R EKTLGKDHPA VAATLNNLAV LYGKRGKYKE 
301 AEPLCKRALE IREKVLGKFH PDVAKQLSNL ALLCQNQGKA EEVEYYYRRA 
351 LEIYATRLGP DDPNVAKTKN NLASCYLKQG KYQDAETLYK EILTRAHEKE 
401 FGSVNGDNKP IWMHAEEREE SKDKRRDSAP YGEYGSWYKA CKVDSPTVNT 
451 TLRSLGALYR RQGKLEAAHT LEDCASRNRK QGLDPASQTK WELLKDGSG 
501 RRGDRRSSRD MAGGAGPRSE SDLEDVGPTA EWNGDGSGSL RRSGSFGKLR 
551 DALRRSSEML VKKLQGGTPQ EPPNPRMKRA SSLNFLNKSV EEPTQPGGTG 
601 LSDSRTLSSS SMDLSRRSSL VG 

BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_4h6, frame 3 

TREMBL:AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds., N «= 1, Score 
= 2824, P - 4e-294 

PIR: 153013 kinesin light chain - human, N = 1, Score » 1927, P = 
4.5e-199 



PIR:C41539 kinesin light chain C - rat, N « 1, Score = 1919, P = 
3.2e-198 



SWISSPROT:KNLC RAT KINESIN LIGHT CHAIN (KLC) . , N = 1, Score - 1919, P = 
3.2e-198 



>TREMBL:AF055666_1 gene: M Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds. 
Length = 599 

HSPs: 
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Score - 2824 (423.7 bits), Expect - 4.0e-294, P = 4.0e-294 
Identities = 558/598 (93%), Positives « 572/598 (95%) 



Query: 



1 MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 
MA MV PREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPL + EAGEAEPGSQERC+L 
1 MATMVLPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLASHEAGEAEPGSQERCLL 



60 



Sbjct: 



60 



Query: 61 LRRSLEAI ELGLGEAQVI LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

L RRS L E A I ELGLGE AQV I L AL S S H LGA V E S E KQ KLRAQVRRL VQEN QWLRE E LAGTQQKL 
Sbjct: 61 LRRSLEAI ELGLGEAQV I LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

Query: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 180 

QRSEQAVAQLEEEKQHLLFMSQIRKLDE P EEKGDVPKD+LDDLFPNEDEQSPAPSP 
Sbjct: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDE-MLPQEEKGDVPKDSLDDLFPNEDEQSPAPSP 179 

Query: 181 GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 

GGGDV+ QHGGYEI PARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
Sbjct: 180 GGG DV AAQH GG Y E I P ARLRT LHN L V I QY AS QGR YEVA V PLC KQALE DL E KT S GH DH PD VA 239 

Query: 241 TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 300 

TMLNILALVYRDQNKYK+AAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 
Sbjct: 240 TMLNILALVYRDQNKYKDAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 299 

Query: 301 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 360 

AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 
Sbjct: 300 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 359 

Query: 361 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 420 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNG+NKPIWMHAEEREE 
Sbjct: 360 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGENKPIWMHAEEREE 419 

Query: 421 SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 480 

SKDKRRD P EYGSWYKACKVDSPTVNTTLR+LGALYR +GKLEAAHTLEDCAS R+RK 
Sbjct: 420 SKDKRRDRRPM-EYGSWYKACKVDSPTVNTTLRTLGALYRPEGKLEAAHTLEDCASRSRK 478 

Query: 481 QGLDPASQTKWELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 540 

QGLDPASQTKWELLKDGSGR G RR SRD+AG P+SESDLE+ GP AEW+GDGSGSL 
Sbjct: 479 QGLDPASQTKWELLKDGSGR-GHRRGSRDVAG PQSESDLEESGPAAEWSGDGSGSL 534 

Query: 541 RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGG 598 

RRSG S FG KL RDAL RRS S EM LV + KLQGG PQEP N RMKRASSLNFLNKSVEEP QPGG 
Sbjct: 535 RRSGSFGKLRDALRRSSEMLVRKLQGGGPQEP-NSRMKRASSLNFLNKSVEEPVQPGG 591 



Pedant information for DKFZphtes3_4h6, frame 3 



Report for DKFZphtes3_4h6.3 



(LENGTH] 622 

(MW) 68934.82 

(pi) 6.72 

[HOMOL] TREMBL:AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus musculus 
kinesin light chain 2 (Klc2) mRNA, complete cds. 0.0 

t BLOCKS ] BL00927C Trehalase proteins 

(BLOCKS] BL01160I Kinesin light chain repeat proteins 

[BLOCKS] BL01160H Kinesin light chain repeat proteins 

[BLOCKS] BL01160G Kinesin light chain repeat proteins 

[BLOCKS] BL01160F Kinesin light chain repeat proteins 

[BLOCKS] BL01160E Kinesin light chain repeat proteins 

(BLOCKS] BL01160D Kinesin light chain repeat proteins 

[BLOCKS] BL01160C Kinesin light chain repeat proteins 

(BLOCKS) BL01160B Kinesin light chain repeat proteins 

[BLOCKS] BL01160A Kinesin light chain repeat proteins 

[SUPFAM] tetratricopeptide repeat homology le-07 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL 8 

[PROSITE] KINESIN_LIGHT 2 

[PROSITE] AMI DAT I ON 2 

[PROSITE] CAMP PHOSPHO_SITE 5 

[PROSITE] CK2_PHOSPHO_SITE 11 

[PROSITE] TYR_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASNJ3LYCOSYLATION 2 

[PFAM] Kinesin light chain repeat 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 12.54 % 

[KW] COILED_COIL 4.98 % 
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SEQ MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 

SEG 

PRD ccccchhhhhhhhhhhhhchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccchhhhh 

COILS 

SEQ LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 

SEG 

PRD hhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhh 

COILS CCCCCCCCCCCC 

SEQ QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccccc 

COILS CCCCCCCCCCCCCCCCCCC 

SEQ GGGDV S GQHGG YE I P ARL RTLHN LV I QY AS QG RY EV A V PL C KQAL EDLEKT SGH DH P DV A 

SEG 

PRD cccccccccccccchhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhhhhccccccchh 

COILS 



SEQ TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhcccccchh 

COILS 



SEQ AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 

SEG 

PRD hhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhccc 

COILS 

SEQ DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 

SEG xxxxx 

PRD ccccccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhh 

COILS 



SEQ SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 

SEG xxxxxxxx 

PRD hhhhhccccccccccccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 



SEQ QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 

SEG xxxxxxxxxxxxxx .xxxxx 

PRD hhccchhhhhhhhhhccccccccccccccccccccccccccccccceeeecccccccccc 

COILS 



SEQ RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGGTG 

SEG xxxxxxxxxx xxxx 

PRD ccccccchhhhhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccc 

COILS 



SEQ LSDSRTLSSSSMDLSRRSSLVG 

SEG xxxxxxxxxxxxxxxxxxxx. . 

PRD cccccccccccchhhhhhcccc 

COILS 



Prosite for DKFZphtes3_4h6.3 



PS00001 


449 


->453 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


587- 


->591 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


425- 


->429 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


505 


->509 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


554 


~>558 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


578 


->582 


CAMP PHOSPHO~SITE 


PDOC00004 


PS00004 


616 


->620 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


30->33 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


90->93 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


451 


->454 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


499 


->502 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


507- 


->510 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


539 


->542 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


615- 


->618 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


151->155 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


163- 


->167 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


232 


->236 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


470- 


->474 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


507- 


->511 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


519- 


->523 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


521- 


->525 


CK2 PHOSPHO~SITE 


PDOC00006 
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PS00006 


568- 


>572 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


589- 


>593 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 

r w v w v v v 


610- 


■>614 


CK2 PHOSPHO~*SITE 


PDOC00006 


pcooom 


339- 


>346 


TYR PHOSPHORS I TE 


PDOC00007 




339- 


•>347 


TYR PHOSPHO 


SITE 


PDOC00007 


pcooom 


424- 


>432 


TYR PHOSPHO* 


"SITE 


PDOC00007 


t ^ V w V V v 


71 


->77 


MYRISTYL 




PDOC00008 


PSO0Q08 

IT w v w w v U 


86->92 


MYRISTYL 




PDOC00008 


PS00008 


182- 


■>188 


MYRISTYL 




PDOC00008 


PS00008 


187- 


•>193 


MYRISTYL 




PDOC00008 


PS00008 


402- 


■>408 


MYRISTYL 




PDOC00008 


PS00008 


482- 


•>488 


MYRISTYL 




PDOC00008 


PS00008 


598- 


■>604 


MYRISTYL 




PDOC00008 


PS00008 


600- 


•>606 


MYRISTYL 




PDOC00008 


PS00009 


292->296 


AMI DAT I ON 




PDOC00009 


PS00009 


499- 


>503 


AMI DAT I ON 




PDOC00009 


PS00016 


502- 


>505 


RGD 




PDOC00016 


PS01160 


223- 


•>265 


KINESIN LIGHT 


PDOC00893 


PS01160 


265- 


>307 


KINESIN LIGHT 


PDOC00893 



Pfam for DKFZphtes3_4h6. 3 



HMM_NAME 

HMM 

Query 

50.46 265 

Alignment to 
Query 

dkf zphtes3 

Query 

Alignment to 

HMM 
Query 

39.10 349 

Alignment to 
Query 

dkfzphtes3 



Kinesin light chain repeat 

*RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
+ALED+EKT+GHDHPDVATMLN+LALV+R+QNKY+E++ ++N 
223 QALEDLEKTSGHDHPDVATMLNILALVYRDQNKYKEAAHLLN 



264 



306 1 42 dkfzphtes3_4h6. 3 strong similarity to Kinesin light chain 
HMM consensus: 

*RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
AL +REKTLG DHP VA LNNLA+++ ++KY+E+E + + 
265 DALAI REKTLGKDHPAVAATLNNLAVLYGKRGKYKEAE PLC K 306 

348 1 42 dkf zphtes3_4h6. 3 strong similarity to Kinesin light chain 
HMM consensus: 

♦RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY+ 
307 RALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYR 348 

390 1 42 dkfzphtes3_4h6.3 strong similarity to Kinesin light chain 
HMM consensus: 

*RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
RALE+ LG D P+VA+ NNLA + Q+KY+++E +Y+ 
349 RALE I YATRLGPDDFNVAKTKNNLASC YLKQGKYQDAETLYK 390 
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DKFZphtes3_4ol9 



group: testes derived 

DKFZphtes3_4ol9 encodes a novel 1180 amino acid protein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome c family heme-binding site signature. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to megakaryocyte stimulating factor and mucin 

complete cDNA, complete cds, EST hits (few) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3767 bp 

Poly A stretch at pos. 3757, polyadenylation signal at pos. 3737 

1 GGCTAGGTTT AGCTTCAGGG GCAGCCCAGG GCAGTGTTGC TGCATATTGC 
51 ATGGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 

101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGACCC TTCAAGGCAG 

151 AGCTGACCTG TCCGGTAATC AAGGCAATGC AGCCGGCCGC CTAGCTACAG 

201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 

251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 

301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 

351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGGCTGTGGT CGAGAGCCAG 

401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 

451 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 

501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 

551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 

601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 

651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 

701 AAGGAGACCC AGTTCCCTTC CTGTGACAAT CTGGTCCTCT GCAGACCCCA 

751 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 

801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGGGGCTGGC CTTCCTGCCA 

851 CACCAGACGG TCACCATCAG ATTTCCCTGC CCAGTGAGTT TGGACGCAAA 

901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 

951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 
1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAACCCAGGG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
1251 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCAAA 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
1401 TCACGGCAAA GAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GTCCAAGACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC ATGATAACCA AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTGT CTGGCCTCTC 
1701 CAACAGTGGC AAATGTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
1751 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1801 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1851 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TTTAAGACCA 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
2051 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GGACAGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT GCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2301 CCACAGGTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC AC AG AC AT AA CCACGTGCCT 
2401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
2451 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2501 GGCCTCAGCG CCCCACCCTG GGCCAAGCCA GAGGACAGAC AGACCCAGCC 
2551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2601 CGGCAGCCTG TGAGGTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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2651 GGCCATTCCA CATGCAACGT TGAGTCCTGG GGAGACAACG GAGCCACACG 
2701 TGCCCAGCCA TCAATGCCCG GCCAGGCGGT GCCCTGCCAG GAGGACACGG 
2751 GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC AATCGTGGAA CCGCGCATGG 
2801 GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC ACCTGGCGCA ACAAGGCGGT 
2851 GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT GGTGTCCATG CAGGCTGCAG 
2901 AGGAGATCCG CATCCTCGCA GTGATCACTA TCCAGGCGGG CGTCCGTGGC 
2951 TACCTGGCGC GTCGCAGGAT CCGGCTGTGG CACCGGGGGG CCATGGTCAT 
3001 CCAAGCTACT TGGCGCGGCT ACCGTGTGCG GCGGAACCTG GCACACCTCT 
3051 GCAGAGCCAC CACGACCATC CAGTCTGCCT GGCGCGGCTA CAGCACCCGC 
3101 CGGGACCAAG CCCGGCACTG GCAGATGCTC CACCCCGTCA CGTGGGTGGA 
3151 GCTGGGCAGC CGGGCCGGGG TCATGTCTGA CCGAAGCTGG TTCCAGGATG 
3201 GCAGAGCCAG GACAGTATCT GACCATCGCT GCTTCCAGTC CTGCCAGGCA 
3251 CACGCTTGCA GCGTCTGCCA CTCCCTGAGC TCCAGGATCG GGAGCCCGCC 
3301 CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC TCGCACCTGT CATACCTGTG 
3351 GACGCACACA GCCCACCCGT GTGGTGCAGG GCATGGGCCA GGGCACTGAG 
3401 GGCCCCGGGG CAGTGTCTTG GGCCTCCGCC TACCAGCTGG CTGCCCTGAG 
3451 TCCCAGGCAG CCGCATCGCC AGGACAAAGC GGCCACAGCC ATCCAGTCCG 
3501 CCTGGAGGGG CTTTAAGATC CGCCAGCAGA TGAGGCAGCA GCAAATGGCA 
3551 GCGAAGATAG TTCAAGCCAC CTGGCGAGGC CACCATACCC GGAGCTGTCT 
3601 GAAGAACACA GAGGCGCTCT TGGGACCAGC AGACCCCTCG GCCAGCTCAC 
3651 GGCACATGCA TTGGCCTGGC ATCTAGGACC CTGGCTCCCT GCAGTGGGGA 
3701 CTTCGTGGGA GGCACTCATG GCTCTCTGGG TCTAATGAAT AAAGTCCTCC 
3751 ACAGCCTAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3673 bp; peptide length: 1180 
Category: similarity to known protein 



1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 
51 PPQPQHEGLK SKEHLPQQPA EGKTASRRVP RLRAVVESQA FKNI LVDEMD 
101 MMHARAATLI QANWRGYWLR QKLISQMMAA KAIQEAWRRF NKRHILHSSK 
151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 
201 VLCRPQSSPL LQPPAAQGTP EPCVQGPHAA RVRGLAFLPH QTVTIRFPCP 
251 VSLDAKCQPC LLTRTIRSTC LVHIEGDSVK TKRVSARTNK ARAPETPLSR 
301 RYDQAVTRPS RAQTQGPVKA ETPKAPFQIC PGPMITKTLL QTYPVVSVTL 
351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTCPMPTM 
401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 
451 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 
501 ITKTPAQLRS VATILKTLCL ASPTVANVKA PPQVAVAAGT PNTSGSIHEN 
551 PPKAKATVNV KQAAKWKAS SPSYLAEGKI RCLAQPHPGT GVPRAAAELP 
601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 
651 VDMAVTLPRG QLAAPLTNAS SQRHPPCLSQ RPLAAPLTKA SSQGHLPTEL 
701 TKTPSLAHLD TCLSKMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGQPIT 
751 DITTCLIPAH QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 
801 DRQTQPQPHG HVPGKTTQGG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 
851 DNGATRAQPS MPGQAVPCQE DTGPADAGVV GGQSWNRAWE PARGAASWDT 
901 WRNKAVVPPR RSGEPMVSMQ AAEEIRILAV ITIQAGVRGY LARRRIRLWH 
951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 
1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSD HRCFQSCQAH ACSVCHSLSS 
1051 RIGSPPSVVM LVGSSPRTCH TCGRTQPTRV VQGMGQGTEG PGA VS WAS AY 
1101 QLAALSPRQP HRQDKAATAI QSAWRGFKIR QQMRQQQMAA KIVQATWRGH 
1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol9, frame 2 

TREMBL:HSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds., N = 2, Score « 
242, P = 9.6e-16 
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TREMBL : HSMUC2 A_l gene: "MUC2"; product: "mucin"; Human mucin-2 gene, 
partial cds., N = 1, Score = 204, P = 1.4e-12 

PIR:S48478 glucan 1, 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 192, P « 9.6e-ll 



>TREMBL:HSU"70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds. 
Length » 1,404 

HSPs: 

Score = 242 (36.3 bits), Expect = 9.6e-l6, Sum P(2) = 9.6e-16 
Identities = 145/546 (26%), Positives « 198/546 (36%) 

Query: 282 KRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAETPKAPFQIC-PGPMITKTLL 340 

K+ + T K AP TP PS + P T AP P P TK+ 

Sbjct: 488 KKPAPTTPKEPAPTTP-KEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAP 546 

Query: 341 QTYPVVSVTLPQ T Y PASTMTTT P PKT S PV- PKVT 1 1 KT P AQM Y PG PTVTKT APHTC 395 

T S T + TP TTP K +P PK TP + P PT TK 
Sbjct: 547 TTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKE — PAPTTTKK 599 

Query: 396 PMPTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPP 455 

P PT K + PT TP++T P T LA P +A T P 

Sbjct: 600 PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 653 

Query: 456 QMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQT-RLPAMIT-KTPAQLRSVAT 513 

+ TTP + P T A T + +P +P+P T + PA T K A T 
Sbjct: 654 EEPTPTTP-EEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKEPAPTTPKETAPTTPKGT 712 

Query: 514 ILKTLCLASPTVANVKAPPQVAVAAG TPNTSGSIHENPPKAKATVNVKQAAKVV-KA 569 

TL +PT AP ++A T TS PK A K+ A K 

Sbjct: 713 APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKPAPTTPKGTAPTTPKEPAPTTPKE 772 

Query: 570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGT--QKQAKTDMAFKTSVAVE 627 

+P+ L+PPT A EL KTT KAT +T+ 

Sbjct: 773 PAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTTTKGPTSTTSDKPAPTTPK-ETAPTTP 831 

Query: 628 MAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAAPL 687 

AP+ K+ PPV+P +S PLSPL 

Sbjct: 832 KEPAPTTPK — KPAPTTPETPPPTTSEVSTPTTTKEPTTIHKSPDESTPELSAEPTPKAL 889 

Query: 688 TKASSQGHLPTELTKTPSLA — HLDTCLSKMHSQTHLATGAVKVQSQAPLAT — CLTKTQ 743 

+ + +PT TKTP+ + T ++ L T + + AP T T T+ 

Sbjct: 890 ENSPKEPGVPT — TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 946 

Query: 744 SRGQPITDITTCLIPAHQAADLS— SNTHSQVLLTGSKVSN— HACQRLGGLSAPP-WAK 798 

+ TT++ D + T+ KV+ ++ P AK 

Sbjct: 947 KTTESKITATTTQVTSTTTQDTTPFKITTLKTTTLAPKVTTTKKTITTTEIMNKPEETAK 1006 

Query: 799 PEDRQTQPQPHGHVPGKTTQGGPCPAA 825 

P+DR T + P K T+ P + 

Sbjct: 1007 PKDRATNSKATTPKPQKPTKAPKKPTS 1033 

Score « 205 (30.8 bits), Expect « 3.1e-12, Sum P(2) - 3.1e-12 
Identities « 146/565 (25%), Positives = 209/565 (36%) 

Query: 281 TKRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAE — TPKAPFQICPGPMITKT 338 

TK+ + K AP TP +ATP+ PK TP+ P P + T 

Sbjct: 597 TKKPAPTAPKEPAPTTPK ETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTT 652 

Query: 339 LLQTYPVVSVTLPQTYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTK-TAPHTCP 396 

+ P T P + TP + +P PK TP + P PT K TAP T P 

Sbjct: 653 PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE — PAPTTPKETAP-TTP 709 

Query: 397 M PTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 453 

PT K + PT + P++ PT +S+KP GAT 

Sbjct: 710 KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTSD--KPAPTTPKGTAPT-T 761 

Query: 454 PPQMHPVTTPAKNPLQTCLS ATMS KTSSQRSPVGVTKPSPQTRLPAMITKTPAQLRS VAT 513 

P + P TTP KPT T T + +P KP+P+ P TK P S 
Sbjct: 762 PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP 818 

Query: 514 I LKT LC LA S PT V AN VKA P PQ V AV AAGT PNTSGSIHENPP K AKAT VNV KQAAKVVKA 569 

T +PT AP APT E PP + V+ K+ + K+ 

Sbjct: 819 APTTPKETAPTTPKEPAPTTPKKPA--PTTP ETPPPTTSEVSTPTTTKEPTTIHKS 872 

Query: 570 SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGTQKQAKTDMAFKTSVAV 626 

S+P AE + L GVP + p + T T K T+ +T+ 
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Sbjct: 873 PDESTPELSAEPTPKALENSPKEPGVP— TTKTPAATKPEMTTTAKDKTTERDLRTTPET 930 

Query: 627 EMAGAPSWTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAA 685 

A AP TK A +K + +T Q+ + T ++ L LA 

Sbjct: 931 TTA-APKMTKETATTTEKT TESKITATTTQVTSTTTQDTTPFKITTLKTTTLAP 983 

Query: 686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS QAPLATCLT 740 

+T + + TE+ P +T K + AT K Q + P +T 

Sbjct: 984 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQKPTKAPKKPTSTKKP 1037 

Query: 741 KTQSR-GQPI TDI T TCLIPAHQAADLSSNTHSQVLLTGSKVSNHACQRLGGLSAPP 795 

KT R +P T T T +P + Q ++ N + S 

Sbjct: 1038 KTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVNPKSEDA 1097 

Query: 796 W-AKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTGHSTCN 845 

A+ E +PH +P T P QG+++ PM + CN 

Sbjct: 1098 GGAEGETPHMLLRPHVFMPEVTPDMDYLPRVPN-QGIIINPMLSDETNICN 1147 

Score = 198 (29.7 bits), Expect = 2.3e-ll, Sum P{2) = 2.3e-ll 
Identities = 142/513 (27%), Positives = 200/513 (38%) 

Query: 204 RPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPHQTVTIRFPCPVSLDAKCQPCLLT 263 

R + P +PP G + H V+ + +P L 

Sbjct: 207 RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 266 

Query: 264 R — T I RST C L VH I EG DS V KT KR V S ART N K ARA P ETPLSRRYDQAVTRPSR AQTQ 315 

T + T L + +V+TK + TNK + E S + Q++ + S AT 
Sbjct: 267 NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGKEKTTSAKETQSIEKTSAKDLAPTS 325 

Query: 316 GPVICAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTII 375 

+ TPKA GP +T T + P T P+ PAST TP + +P + 
Sbjct: 326 KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE-PAST TPKEPTPTTIKSAP 375 

Query: 376 KTPAQMYPGPTVTKTAPHTC — PMPTMTKIQVHPTASRTGTPRQTC-PATITAKNRPQVS 432 

TP + P PT TK+AP T P PT TK + PT + P T PA T K+ P 
Sbjct: 376 TTPKE--PAPTTTKSAPTTPKEPAPTTTK-EPAPTTPKEPAPTTTKEPAPTTTKSAPTTP 432 

Query: 4 33 LLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVT 489 

+ K P PA TP + P TTP KPT + T + +P 

Sbjct: 433 KEPAPTTPKKPAPTTPKEPAPT-TPKEPTP-TTP-KEPAPTTKEPAPT-TPKEPAPTAPK 4 88 

Query: 4 90 KPSPQT-RLPAMIT-KTPAQLRSVA TILK TLCLAS PT VANVKAPPQVAVAAGT 540 

KP+P T + PA T K PA + T K T ++PT AP AT 

Sbjct: 489 KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAPTT 548 

Query: 541 PNT-SGSIHENP PKAKATVNVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 594 

PS++P PKA K+A K +P+ E +P P P+ 

Sbjct: 549 PKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPAPTA— PK 606 

Query: 595 AAAELPLEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTK-VAEEGDKPPHVYVPVDM 653 

A* P ++ T K+ K + AP+ + +A + P P + 

Sbjct: 607 EPA— PTTPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTPEEPTPTTPEEP 664 

Query: 654 AVTLPRGQLAAPLTNASSQRHP-PCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTC 712 

A T P+ AAP T +PP+PAPT PET T 

Sbjct: 665 APTTPKA — AAPNT PKEPAPTTPKEP — APTTPKEPAPTTPKETAPTTPKGTAPTT 716 

Query: 713 LSK 715 
L + 

Sbjct: 717 LKE 719 

Score = 108 (16.2 bits). Expect - 4.3e-02, Sura P(2) - 4.3e-02 
Identities = 60/214 (28%), Positives « 85/214 (39%) 

Query: 265 TIRSTCLVHIEGDSVKTKRVSAR-TNKA— RAPETP-LSRRYDQAVTRPSRAQTQGPVKA 320 

T + +H D T +SA T KA +P+ P + A T+P T 

Sbjct: 862 TTKEPTTIHKSPDE-STPELSAEPTPKALENSPKEPGVPTTKTPAATKPEMTTTAKDKTT 920 

Query: 321 ETP--KAPFQICPGPMITK-TLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTIIKT 377 

E P P +TK T T + T T TTT T+P K+T +KT 

Sbjct: 921 ERDLRTTPETTTAAPKMTKETATTTEKTTESKITATTTQVTSTTTQD-TTPF-KITTLKT 978 

Query: 378 PAQMYPGPTVTK T A P HTC PMPTMT - K I QVH PT AS RT GT PRQT C PAT I T AKN RPQV S L 433 

+ P T TK T PTK+ TS+ TP+ P A +P + 

Sbjct: 979 TT-LAPKVTTTKKTITTTEIMNKPEETAKPKDRATNSKATTPKPQKPTK--APKKPTSTK 1035 

Query: 434 LASIMKSL — PQVCPGPA-MAKTPPQMHPVTTPAKNPLQT 470 

M + P+ P P M T P+++P + A+ LQT 
Sbjct: 1036 KPKTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQT 1075 

Score = 56 (8.4 bits), Expect « 3.1e-12, Sura P(2) = 3.1e-12 
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Identities = 17/60 (28%), Positives - 22/60 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P PS E AP P+ + K+ P P E +■ + P 

SbjCt: 533 TTKEPAPTTTKSAPTTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEP 592 

Score = 52 (7.8 bits), Expect = 9.6e-16, Sum P(2) - 9.6e-16 
Identities - 17/59 (28%), Positives « 22/59 (37%) 

Query: 22 TVHEPV-VTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAE-GKTASRR 78 

T EP T P P P+ E P P+ +KE P P E TA ++ 

SbjCt: 431 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKEPAPTTPKEPAPTAPKK 489 

Score - 51 (7.7 bits), Expect = 1.2e-15, Sum P(2) = 1.2e-15 
Identities = 15/51 (29%), Positives = 19/51 (37%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAE 71 

T EP T P P P+ + AP P+ + KE P P E 

SbjCt: 416 TTKEPAPTTTKSAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKE 4 66 

Score - 47 (7.1 bits), Expect - 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 12/41 (29%), Positives = 17/41 (41%) 

Query: 36 PAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 7 6 

P P P + P +P +KS P++PA T S 

SbjCt: 350 PTPTTPK — EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score = 47 (7.1 bits), Expect - 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 15/57 (26%), Positives = 19/57 (33%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKSAPTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 433 

Score = 46 (6.9 bits). Expect = 4.0e-15, Sum P(2) = 4.0e-15 
Identities = 16/58 (27%), Positives = 22/58 (37%) 

Query: 20 LATVHEPVVT QWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P P+ + P +P KS P++PA T 

Sbjct: 344 LTTPKEPTPTTPKEPASTTPKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score » 42 (6.3 bits), Expect *» 1.0e-14, Sum P(2) = 1.0e-14 
Identities = 15/60 (25%), Positives = 21/60 (35%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+ + AP P+ + KE P E + + P 

SbjCt: 463 TPKEPAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score = 39 (5.9 bits), Expect = 2.1e-14, Sum P(2) = 2.1e-14 
Identities - 15/55 (27%), Positives - 20/55 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

T EP T P PA + + P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKS 544 



Pedant information for DKFZphtes3_4ol9, frame 2 
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1180 

127693.40 
10.25 

SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). le-08 

98 classification not yet clear-cut [S. cerevisiae, YJRlSlc] 6e-06 

30.01 organization of cell wall [S. cerevisiae, YIR019c] 6e-06 

30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] 6e-06 

01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c} 6e-06 

BL00412B Neuromodulin (GAP-43) proteins 

CYTOCHROME_C 1 

MYRISTYL 12 

CAMP_PHOSPHO_SITE 1 

CK2 PHOSPHO_SITE 8 

PKC~PHOSPHO SITE 25 

ASN_GLYCOSYLATION 2 

Alpha_Beta 

LOW COMPLEXITY 5.00 % 



(LENGTH] 

[MW] 

[pi] 

[HOMOL] 

EFUNCAT) 

C FUNCAT ) 

[ FUNCAT ) 

t FUNCAT] 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 
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SEQ MTLQGRADLSGNQGNAAGRIjATVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLK 

SEG 

PRD cccccceeeccccccceeeeeeeeceeeeeeeecccccccceeeeccccccccccccccc 

SEQ S KEH L PQQ PAEGKT AS RRV PRL PAW ES QA FKN I L VDEMDMMHARAAT L I QANW RG YW L R 

SEG 

PRD cccccccccccccccccchhhhhhhhhhhhhhheeehhhhhhhhhhhhhhhhhccchhhh 

SEQ QKLI SQMMAAKAIQEAWRRFNKRHILHSSKSLVKKTRAEEGDI PYHAPQQVRFQHPEENR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeecccchhhhhhhhcccccccccceeeecccccce 

SEQ LLSPPIMVNKETQFPSCDNLVLCRPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPH 

SEG 

PRD eeccceeeecccccccccceeeecccccccccccccccccccccccccceeeeeeeeccc 

SEQ QTVTIRFPCPVSLDAKCQPCLLTRTIRSTCLVHIEGDSVKTKRVSARTNKARAPETPLSR 

SEG 

PRD eeeeeecccccccccccccccccccccceeeeecccccccceeeeecccccccccccccc 

SEQ RYDQAVTRPSRAQTQGPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMT 

SEG xxxx 

PRD ccceeeeeccccccccceeecccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTIIKTPAQMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRQTCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccceeeccccccccccccccccccccccccccceeeccccccccccccccc 

SEQ ATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTS 

SEG 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SQRSPVGVTKPSPQTRLPAMITKTPAQLRSVATILKTLCLASPTVANVKAPPQVAVAAGT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PNTSGSIHENPPKAKATVNVKQAAKVVKASSPSYLAEGKIRCLAQPHPGTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LE A E K I KT GTQKQ AKT DMA FKTS VA VEMAGA P S WT KV AE EG D K P P H VYV P VDMA VT L P RG 

SEG xxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeccccccceeeccccccccccc 

SEQ QLAAPLTNASSQRHPPCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLATGAVKVQSQAPLATCLTKTQSRGQPITDITTCLIPAHQAADLSSNTHSQVLLTGSKV 

SEG 

PRD ccccceeeeeccccceeeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ SNHACQRLGGLSAPPWAKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTG 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSTCNVESWGDNGATRAQPSMPGQAVPCQEDTGPADAGVVGGQSWNRAWEPARGAASWDT 

SEG 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ WRNKAWPPRRSGEPMVSMQAAEEIRILAVITIQAGVRGYLARRRIRLWHRGAMVIQATW 

SEG 

PRD ccceeecccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ RGYRVRRNLAHLCRATTTIQSAWRGYSTRRDQARHWQMLHPVTWVELGSRAGVMSDRSWF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhhhhhhhh 

SEQ QDGRARTVSDHRCFQSCQAHACSVCHSLSSRIGSPPSWMLVGSSPRTCHTCGRTQPTRV 

SEG 

PRD hccceeeeccceeeecccceeeeeeeecccccccccceeeeeecccccccccccccccee 

SEQ VQGMGQGTEGPGAVSWASAYQLAALS PRQPHRQDKAATAIQSAWRGFKI RQQMRQQQMAA 

SEG xxxxxxxxxxxxx 

PRD eeeccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIVQATWRGHHTRSCLKNTEALLGPADPSASSRHMHWPGI 

SEG xx 

PRD hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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Proaite for DKFZphtes3_4ol9.2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00190 



542->546 
668->672 
282->286 
76->79 
148->151 
244->247 
265->268 
278->281 
281->284 
285->288 
288->291 
299->302 
322->325 
414->417 
424->427 
481->484 
610->613 
671->674 
679->682 
900->903 
959->962 

987- >990 
1015->1018 
1049->1052 
1065->1068 
1106->1109 
1146->1149 
1171->1174 

22->26 
42->46 
156->160 
546->550 
84B->852 

988- >992 
1003->1007 
1027->1031 

11->17 
14->20 
539->545 
591->597 
746->752 
777->783 
853->859 
878->884 
882->888 
1008->1014 
1053->1059 
1083->1089 
1042->1048 



ASN — GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I T E 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC^PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S I TE 

CK2_PHOSPHO SITE 

CK2_PHOS PHO~S ITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

CYTOCHROME C 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00169 



(No Pfam data available for DKFZphtes3_4ol9.2) 
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DKFZphtes3_50j4 



group: testes derived 

DKFZphtes3_50j4 encodes a novel 187 amino acid protein proline rich protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



unknown, prolin ritch protein 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1186 bp 

Poly A stretch at pos. 1176, polyadenylation signal at pos. 1126 



1 CACTGGGCGT CTGAAGCTCA GAGCTCACCC CTGAGATGGG CTCTCCTAGG 

51 CCTCCTGGGA TGAGGGAGCC ACCAGGACCC AGTGCTGTGA TGCCTGCTCT 

101 TCCCTCTACC AGCACCTGCC CGCCCAGAGA CCAGGGCACC CCTGAAGTCC 

151 AGCCCACCCC TGCAAAGGAC ACATGGAAGG GCAAGCGGCC TCGATCCCAG 

201 CAGGAGAACC CAGAGAGCCA GCCTCAGAAG AGGCCACGCC CCTCAGCCAA 

251 GCCCTCCGTC GTAGCTGAGG TCAAGGGCAG CGTCTCGGCC AGCGAACAGG 

301 GCACCTTGAA TCCCACGGCT CAAGACCCCT TCCAGCTCTC CGCTCCTGGC 

351 GTCTCCTTGA AGGAGGCTGC AAATGTTGTG GTCAAGTGCC TCACCCCTTT 

401 CTACAAGGAG GGCAAGTTTG CTTCCAAGGA GTTGTTTAAA GGCTTTGCCC 

451 GCCACCTCTC ACACTTGCTG ACTCAGAAGA CCTCTCCTGG AAGGAGCGTG 

501 AAAGAAGAGG CCCAGAACCT CATCAGGCAC TTCTTCCATG GCCGGGCCCG 

551 GTGCGAGAGC GAAGCTGACT GGCATGGCCT GTGTGGCCCC CAGAGATGAC 

601 CAACTGCTGG CTGGGCAGGG CCCGCGTCCT CCCCCAGATT CTAGCATGGG 

651 TCATCCTGGG CCTCACCTGC TGATGCCAGG GCCATCGTCT TTTCTCAGTC 

701 CTTCTCCTTT CCAACCATAC TTGGCTTTGG GGATGACCCC AGACACCCCC 

751 TGAATCCAGG TCAGAGGTCA GCCCACCTTT CTTTCTGCTT GCAAAGCCTA 

801 TAGACCCTTC TCAGAGCGGT CCTCATGGCT GGGTTTTCTG GGACACATGT 

851 CGAGGACAGA AGGTGGAGGG TGGTGGAGCT GCTGCTGGAA GAAGGGGAAG 

901 GAAGAGTGGC CCCTCCCCGA GTTCTAAGTC AGGATGAGGC CCACCTGTCC 

951 AAGGTATCGG AACCTACCCA GGGGACCCTC AGATCCTCCA CCCACTCCCC 

1001 CATCCATTAC GATGCCAGCT TCCAGCCTTG CCCAGGTCAG AGCTGTGGCA 

1051 GAGGAGAGGC AGCCAGGCCC TGTTCCTGCT CAGCTCCTGC TCAGGAAGGC 

1101 CAGGCCTGAC AGATGTTTGG GAGAGGAATA AAGTTGTGTT GTTGTGGGGC 

1151 ATGCAGGCGT GCACACAGCC CTTTTCAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 36 bp to 596 bp; peptide length: 187 
Category: putative protein 



1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVQP TPAKDTWKGK 
51 RPRSQQENPE SQPQKRPRPS AKPSVVAEVK GSVSASEQGT LNPTAQDPFQ 
101 LSAPGVSLKE AANVWKCLT PFYKEGKFAS KELFKGFARH LSHLLTQKTS 
151 PGRSVKEEAQ NLIRHFFHGR ARCESEADWH GLCGPQR 

BLASTP hits 

Entry MMU92455_1 from database TREMBL : 
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product: "WW domain binding protein 7"; Mus musculus WW domain binding 
protein 7 mRNA, partial cds. 

Score « 134, P = 6.9e-08, identities = 45/125, positives •« 56/125 



Alert BLAST P hits for DKFZphtes3_50j4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50j4, frame 3 
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(LENGTH] 

tMW] 

tpl] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



187 

20353.06 
9.76 

MYRISTYL 1 

AMI DAT I ON 1 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW COMPLEXITY 



8.56 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MGSPRPPGMREPPGPSAVMPALPSTSTCPPRDQGTPEVQPTPAKDTWKGKRPRSQQENPE 
xxxxxxxxxxxxxxxx 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
SQPQKRPRPSAKPSVVAEVKGSVSASEQGTLNPTAQDPFQLSAPGVSLKEAANVVVKCLT 
cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 
PFYKEGKFASKELFKGFARHLSHLLTQKTSPGRSVKEEAQNLIRHFFHGRARCESEADWH 
cccccccchhhhhhhhhhhhhhhhheeecccccchhhhhhhhhhhhhhccchhhhhhhhh 
GLCGPQR 



CCCCCCC 



Prosite for DKFZphtes3_50j 4 . 3 



PS00005 


3->6 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


46->49 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


70->73 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


107->110 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


146->149 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


154->157 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


54->58 


CK2 PHOSPHO"SITE 


PDOC00006 


PS00006 


84->88 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


94->98 


CK2 PHOSPHO"SITE 


PDOC00006 


PS00006 


107->111 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


154->158 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


175->179 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


81->87 


MYRISTYL 




PDOC00008 


PS00009 


48->52 


AMI DAT ION 




PDOC00009 



(No Pfam data available for DKFZphtes3_50j4 .3) 
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DKFZphtes3_50n06 
group: testes derived 

DKFZphtes3_50nQ6 encodes a novel 186 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1095 bp 

Poly A stretch at pos . 1085, polyadenylation signal at pos. 1061 

1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCGGAAGGT 

51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 

101 GACCTGCTGT GCTCACATGC CCCCCTGTCC AGCGAGGACG ACACCTCCCC 

151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 

201 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCGCTCCTG 

251 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 

301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 

351 GGGACAGGAC CAGGAGGAGC TACTACCTCA ATGAGATCCA GAGCTTCGCG 

401 GGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 

451 CCGCCGCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 

501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 

551 CTGGACGGCT CCGTGGACGA GAGGAAGCTG CGCGAGCTGA CGCAGCGCTA 

601 CCTGGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 

651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 

701 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC AGCAGCCCGG CCGCGCTGCG 

751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTGC 

801 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAAGCCCCTC 

851 TTCGCCTGGT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 

901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCGCGTCCC CCCGGAGGGG 

951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 

1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGGAGTGCT 

1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 302 bp to 859 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 



1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR WGEIAFQLD 
51 RRILAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 
101 LALSARLEKL GYSRDVHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 
151 KLVIDVVPPK FLGDSLLLLN CLCELSKEDG KPLFAW 

BLASTP hits 

No BLASTP hits available 
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NO Alert BLAST P hits found 



Alert BLAST P hits for DKFZphtes3_50n06, frame 2 
LSTP hits found 
Pedant information for DKFZphtes3_50n06, frame 2 



Report for DKFZphtes3_50n06.2 

[LENGTH] 186 

[MW] 21049.39 

[pi] 9.28 

[KW] All Alpha 

[KW] LOW~COMPLEXITY 5.38 % 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLNEIQSFAGAEKDARVVGEIAFQLDRRILAYVFPG 

SEG 

PRD ccccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VTRLYGFTVANIPEKIEQTSTKSLDGSVDERKLRELTQRYLALSARLEKLGYSRDVHPAF 

SEG 

PRD ceeeeeeeeeeccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccch 

SEQ SEFLINTYGILKQRPDLRANPLHSSPAALRKLVIDWPPKFLGDSLLLLNCLCELSKEDG 

SEG xxxxxxxxxx 

PRD hhhhhhcceeecccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhcccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 

(No Prosite data available for DKFZphtes3_50n06.2) 
{No Pfam data available for DKFZphtes3_50n06.2) 
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DKFZphtes3_50n23 



group: testes derived 

DKFZphtes3_50n23 encodes a novel 499 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1907 bp 

Poly A stretch at pos.. 189*7, polyadenylation signal at pos. 1872 



1 GGGCACCAGC CACTTTCCAC CATGACTGTG CGCTCGAGGG TCGCAGATGT 
51 GTTCGGCAGC AAGGACACTG AGAGCCTTGA GCCTGTGCTT TTACCCTTAG 
101 TAGATCGCAG GTTTCCTAAG AAATGGGAAA GACCGGTGGC AGAAAGCTTA 
151 GGCCACAAAG ACAAAGACCA GGAGGACTAC TTCCAGAAGG GAGGACTCCA 
201 AATTAAGTTC CACTGTAGCA AGCAGCTGTC TCTAGAGAGC TCCAGGCAGG 
251 TGACCTCTGA GAGCCAAGAG GAGCCCTGGG AGGAGGAATT CGGCCGGGAG 
301 ATGCGGAGGC AGCTGTGGCT GGAGGAGGAG GAGATGTGGC AGCAGCGGCA 
351 GAAGAAGTGG GCCCTGCTGG AGCAGGAGCA TCAGGAGAAG CTGCGGCAGT 
401 GGAATCTGGA AGACCTGGCC AGGGAGCAAC AGCGGAGATG GGTCCAGCTA 
451 GAAAAGGAGC AGGAGAGCCC ACGGAGAGAG CCAGAGCAGC TAGGGGAGGA 
501 TGTGGAGAGG AGGATCTTCA CACCCACCAG TCGATGGAGG GACTTGGAGA 
551 AGGCAGAGCT ATCATTAGTG CCTGCCCCAA GCCGGACCCA ATCTGCTCAC 
601 CAAAGCAGGA GGCCACACTT GCCCATGTCT CCTAGTACCC AGCAGCCTGC 
651 CCTGGGAAAG CAGAGACCTA TGAGTTCAGT GGAGTTTACC TACAGACCAC 
701 GGACCCGCCG AGTTCCCACA AAGCCCAAGA AATCTGCCTC CTTTCCTGTC 
751 ACTGGGACAT CCATCCGAAG GCTGACCTGG CCCTCTTTGC AGATATCCCC 
801 TGCAAATATT AAGAAGAAGG TGTACCACAT GGACATGGAG GCCCAGAGGA 
851 AGAACCTGCA GCTCCTGAGT GAGGAGTCTG AGTTGAGGCT GCCCCACTAC 
901 CTGCGCAGCA AAGCACTGGA GCTCACCACC ACCACCATGG AGCTGGGCGC 
951 GCTCAGGCTG CAGTACCTGT GCCATAAGTA CATCTTCTAT AGACGCCTCC 
1001 AGAGCCTCCG GCAAGAAGCG ATCAACCATG TACAAATCAT GAAAGAAACG 
1051 GAGGCTTCCT ACAAGGCCCA GAACCTCTAC ATCTTCCTGG AAAACATTGA 
1101 CCGCCTGCAG AGTCTCAGGC TGCAGGCCTG GACGGACAAG CAGAAGGGGC 
1151 TGGAGGAGAA GCACCGAGAG TGCCTGAGCA GCATGGTGAC CATGTTCCCC 
1201 AAGCTCCAGC TGGAGTGGAA CGTTCACCTG AACATCCCTG AGGTCACCTC 
1251 GCCAAAGCCA AAGAAATGCA AGTTGCCTGC AGCCTCACCC CGGCACATCC 
1301 GCCCCAGTGG CCCCACCTAC AAGCAGCCCT TTCTGTCTAG GCACCGGGCA 
1351 TGTGTGCCCC TGCAGATGGC CCGCCAACAG GGGAAGCAGA TGGAGGCTGT 
1401 CTGGAAGACC GAGGTGGCCT CCTCCAGTTA CGCAATAGAA AAAAAGACCC 
1451 CTGCCAGCCT TCCCCGGGAC CAGCTGAGGG GACACCCAGA TATTCCCCGG 
1501 CTGTTGACAC TGGACGTGTA GTCCTCCTGC CACAAAAGCC TGAACTTCCT 
1551 GAAGGCCCAG TAAGCGCCTC AGCGAACCAA AGGAAGGAAT GCCAGGAACC 
1601 TACAAATGAA TCCGCTTAGC TTGTTCAAAA AAAGTCAAGC GAGTCACTCC 
1651 CTGGAACCCA AATAAGCCAG AAGGATCAAG ACAGCCCCAG TCTCCACTGC 
1701 ATCCCTCAGC CAGTGATTCT CAACCTTCTG AGGGACGGAA ACCCACAGAG 
1751 AACTTGGTCA AAATGCAGGT TCCCAGCTGG TGCTTTTAAA GAAACCCTCT 
1801 GGGGGTTGCT GAGTACTCCT AGAACTTTGA GAAACACTGC TTCCCTCCTG 
1851 CAGTCCCCAA ACTCTACATT TTAATAAAAT AGAGGTTGGT TTATTTTAAA 
1901 AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 22 bp to 1518 bp; peptide length: 4 99 
Category: similarity to known protein 
Classification: no clue 



1 MTVRSRVADV FGSKDTESLE PVLLPLVDRR FPKKWERPVA ESLGHKDKDQ 
51 EDYFQKGGLQ IKFHCSKQLS LESSRQVTSE SQEEPWEEEF GREMRRQLWL 
101 EEEEMWQQRQ KKWALLEQEH QEKLRQWNLE DLAREQQRRW VQLEKEQESP 
151 RREPEQLGED VERRIFTPTS RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 
201 PMSPSTQQPA LGKQRPMSSV EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 
251 LTWPSLQISP ANIKKKVYHM DMEAQRKNLQ LLSEESELRL PHYLRSKALE 
301 LTTTTMELGA LRLQYLCHKY I FYRRLQSLR QEAINHVQIM KETEASYKAQ 
351 NLYIFLENID RLQSLRLQAW TDKQKGLEEK HRECLSSMVT MFPKLQLEWN 
401 VHLNIPEVTS PKPKKCKLPA ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 
451 RQQGKQMEAV WKTEVASSSY AIEKKTPASL PRDQLRGHPD IPRLLTLDV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_50n23, frame 1 

PIR:S28589 trichohyalin - rabbit, N = 1, Score = 134, P = 5.3e-05 

TREMBLNEW:AF132479__1 product: "Ese2L protein"; Mus musculus Ese2L 

protein mRNA, complete cds., N = 1, Score » 130, P - 0.00017 



>PIR:S28589 trichohyalin - rabbit 
Length - 1,407 

HSPs: 

Score « 134 (20.1 bits), Expect = 5.3e-05, P - 5.3e-05 
Identities - 88/354 (24%), Positives = 154/354 (43%) 

Query: 29 RRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIK-FHCSKQLSLESSRQVTSESQEEPWE 87 

R++ K +R + L + ++E ++ G + F +QL +++ E +EE + 

Sbjct: 165 RQYRDKEQRLQRQELEERRAEEEQLRRRKGRDAEEFIEEEQLRRREQQELKRELREEEQQ 224 

Query: 88 EEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQ 147 

RE + L+EEE RQ++W E Q++LR+ LE++ RE+++R Q E+ + 
Sbjct: 225 RRERREQHERA-LQEEEEQLLRQRRWRE-EPREQQQLRR-ELEEI-REREQRLEQEERRE 280 

Query: 148 ESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQ 207 

+ RRE ++L E ERR ++ + EL RQQR + + 

Sbjct: 281 QQLRRE-QRL-EQEERREQQLRRELEEIREREQRLEQEERREQRLEQEERREQQLKRELE 338 

Query: 208 QPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 266 

+ +QR +E RR + ++++A GS+RW SA++K 

Sbjct: 339 EIREREQR LEQEER-REQLLAEEVREQAR — ERGESLTR-RWQRQLESEAGARQSK 390 

Query: 267 VYHMDMEAQRKNLQLLSEESELRLPHYLRSKALELTTTTM ELGALRLQY LCHK Y 320 

VY +R+ QL++ER R + LE E RQL + 

Sbjct: 391 VYS RPRRQEEQSLRQDQE RR- QRQER E REL EEQA RRQQQWQA EE E S E RRRQRL S ARP 446 

Query: 321 IFYRRLQSLRQEAINHVQIMKETEASYKAQNLYI-FLENIDRLQSL-RLQAWTDKQKGLE 378 

RQ +E Q+EE ++ + FLE ++LQ R Q ++ E 

Sbjct: 447 SLRER-QLRAEERQEQEQRFREEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQE 505 

Query: 379 EKHR 382 
++ R 

Sbjct: 506 DRER 509 

Score = 119 (17.9 bits), Expect = 2.2e-03, P = 2.2e-03 
Identities » 79/357 (22%), Positives - 150/357 (42%) 

Query: 33 KKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 92 

++ E+ + + K +++E Q+ + + +Q R+ + + + EE+F + 

Sbjct: 990 RREEQELRQERDRKFREEEQLLQE— REEERLRRQERDRKFREEERQLRRQELEEQFRQ 104 6 

Query: 93 EMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRR 152 

E R+ LEE+ + Q++++K L QE K R+ E+ R +Q R QL +E++ R 
Sbjct: 1047 ERDRKFRLEEQ-IRQEKEEK-QLRRQERDRKFRE EEQQRRRQEREQQLRRERDRKFR 1101 

Query: 153 EPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSR--RPHLPMSPSTQQPA 210 
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E EQL ++E RRL+EL+ + +RR + +++ 

Sbjct: 1102 EEEQLLQEREEERLRRQERARKLREEE-QLLRREEQLLRQERDRKFREEEQLLQESEEER 1160 

Query: 211 LGKQ RPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKKKV 267 

L+QR+E+R + +++ +R+ Q ++++ 

Sbjct: 1161 LRRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQERARKLREEE 1220 

Query: 268 YHMDMEAQ RKNLQLLS-EESELRLPHYLRSKALELTTTTMELGALRLQYL 316 

+ E Q R+ QLL EE ELR + + E E LR Q 

Sbjct: 1221 QLLRQEEQELRQERDRKFREEEQLLRREEQELRRERDRKFREEEQLLQEREEERLRRQER 1280 

Query: 317 CHKY I FYRRLQS LRQEA I NH VQ I MKE T E AS Y KAQNL Y I FLEN I DRLQ - S LRLQAWT DKQK 375 

K + L E ++ +E + Y+A+ + E RL+ LR + +++ 

Sbjct: 1281 ARK— LREEEEQLLFEEQEEQRLRQERDRRYRAEEQFAREEKSRRLERELRQEEEQRRRR 1338 

Query: 376 GLEE KH RE 383 
E K RE 

Sbjct: 1339 ERERKFRE 1346 

Score = 109 (16.4 bits}, Expect = 1.9e-01, P = 1.7e-01 
Identities = 37/113 (32%), Positives - 60/113 (53%) 

Query: 67 KQLSLESSRQVTSESQ— EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

+QL E R+ E Q +E EE R+ R + EEE++ Q+R+++ L QE + KL 
Sbjct: 764 QQLRRERDRKFREEEQLLQEREEERLRRQERERKLREEEQLLQEREEE-RLRRQERERKL 822 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

R+ EL +E++ ++ +E+E RE EQL E+ + R R L + E 

Sbjct: 823 REE — EQLLQEREEERLR-RQERERKLREEEQLLRQEEQEL — RQERARKLREEE 872 

Score » 107 (16.1 bits), Expect » 3.0e-01, P = 2.6e-01 
Identities = 35/109 (32%), Positives « 61/109 (55%) 

Query: 71 LES S RQVT S ESQEE PWE- EE FGREMRRQL WLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

L Q+ ES+EE +E +++RR+ + EEE++ Q+R+++ L QE + KLR+ 
Sbjct: 742 LREEEQLLQESEEERLRRQEREQQLRRERDRKFREEEQLLQEREEE-RLRRQERERKLRE 800 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

E L +E++ ++ +E+E RE EQL ++ E R R L + E 

Sbjct: 801 E--EQLLQEREEERLR-RQERERKLREEEQLLQEREEERLRRQERERKLREEE 850 

Score = 104 (15.6 bits), Expect = 9.4e-02, P = 9.0e-02 
Identities = 84/339 (24%), Positives = 149/339 (43%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEK 123 

+QL E ++ +EE EE RE R++L +LEEEE Q+R++ L E++ +++ 
Sbjct: 451 RQLRAEERQEQEQRFREE EEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDR 507 

Query: 124 LRQWNLEDLAREQQRRWVQLEKEQES PRR EP EQLGEDVE-RRI FTPTSRWRDL 175 

R+ ++ Q RW QL++E + R +P EQL E+ E +R R R+ 

Sbjct: 508 ERRRRQQEQRPGQTWRW-QLQEEAQRRRHTLYAKPGQQEQLREEEELQREKRRQEREREY 566 

Query: 176 EKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRT RRV 231 

+ EL + + R+ + Q+L + R+ E + R RR 

Sbjct: 567 RE E E - KLQREE DE KRRRQER E RQY RE LE E LRQE EQL -RDRKLREEEQLLQEREEERL RRQ 624 

Query: 232 PTKPK KSASFPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRK NLQLLSEE 285 

+ K + +R+ L+ ++++ + E +RK QLL E 

Sbjct: 625 ERERKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQER 684 

Query: 286 SELRL PHY LRS KALE LTTTTMELGALRLQYLCHKYIFYRRL-QSLRQEAINHV-- 337 

E RL R++ L L ELR + L+ RRQ LRQE + 

Sbjct: 685 EEERLRRQERARKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQLLRQERDRKLRE 744 

Query: 338 — QIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRECL 385 

Q+++E+E + E +L+ R + + ++++ L+E+ E L 

Sbjct: 745 EEQLLQESEEERLRRQ EREQQLRRERDRKFREEEQLLQEREEERL 789 

Score « 103 (15.5 bits), Expect = 1.2e-01, P « l.le-01 
Identities « 42/152 (27%), Positives = 74/152 (48%) 

Query: 36 ERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFG-REM 94 

ER + K +++E ++ +++ +++L E + + E QE E + RE 
Sbjct: 835 ERLRRQERERKLREEEQLLRQEEQELRQERARKLR-EEEQLLRQEEQELRQERDRKLREE 893 

Query: 95 RRQLWLEEEEMWQQRQKKWA LLEQEHQEKLRQWNLEDLAREQQ RRWVQ-LEKE 146 

L EE+E+ Q+R +K LL++ +E+LR+ E RE++ RR Q L +E 

Sbjct: 894 EQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKLREEEQLLRREEQELRRE 953 

Query: 147 QESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 
+ RE EQL ++ E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score - 103 (15.5 bits), Expect - 7.8e-01, P « 5.4e-01 
Identities - 31/91 (34%), Positives =» 52/91 (57%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

++L E R++ E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 
Sbjct: 642 QELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQEREEE-RLRRQERARKLRE 700 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQL 157 

E L R++++ +L +E+E RE EQL 
Sbjct: 701 E — EQLLRQEEQ ELRQERERKLREEEQL 726 

Score = 101 (15.2 bits), Expect = 2.0e-01, P - 1.8e-01 
Identities = 38/111 (34%), Positives = 57/111 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLE 130 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ + 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEE-Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ++ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERDRKFREEEQLLQEREEERLRRQERDRKFREEERQL 1035 

Score = 101 (15.2 bits), Expect = 1.3e+00, P = 7.2e-01 
Identities - 33/108 (30%), Positives » 56/108 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R++ E Q EE+ R+ R + EEE++ +Q +++ L QE KLR+ E 
Sbjct: 841 ERERKLREEEQLLRQEEQELRQERARKLREEEQLLRQEEQE LRQERDRKLREE — EQ 895 

Query: 132 LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

L R++++ +L +E++ RE EQL ++ E R R L + E 

Sbjct: 896 LLRQEEQ ELRQERDRKLREEEQLLQESEEERLRRQERERKLREEE 940 

Score = 99 (14.9 bits), Expect = 2.0e+00, P = 8.7e-01 
Identities = 32/97 (32%), Positives = 50/97 (51%) 

Query: 72 ES SRQVT SESQEEPWEEE FGREMRRQLWL EE E EMWQQRQKKW AL LEQEH QE KL RQWN LE D 131 

E R+ E Q EE E R L EEE Q +++ L QE + KLR+ E 
Sbjct t 578 EKRRRQERERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREE— EQ 635 

Query: 132 LAREQ QRRWVQLEKEQESPRREPEQLGEDVERRI 165 

L R++ Q R +L +E++ RRE ++L ++ ER++ 

Sbjct: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Score = 99 (14.9 bits), Expect = 2.0e+00, P *» 8.7e-01 
Identities = 34/111 (30%), Positives = 58/111 (52%) 

Query: 67 KQLSLESSRQVTSESQ—EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

++L E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERERKL 720 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEK 177 

R+ + L RE+Q L +E++ RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQEREQQLRR 768 

Score - 98 (14.7 bits), Expect - 2.6e+00, P *» 9.2e-01 
Identities = 37/146 (25%), Positives = 77/146 (52%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 
Sbjct: 715 ERERKLREEE — QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQES PRREPEQLG-EDVERRI 165 

++ E+EQ RE E+L ++ ER++ 
Sbjct: 773 KF — REEEQLLQEREEERLRRQERERKL 798 

Score - 97 (14.6 bits), Expect - 3.3e-t-00, P = 9.6e-01 
Identities - 38/129 (29%), Positives - 63/129 (48%) 

Query: 72 ESSRQVTSESQ— EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL 129 

E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE KLR+ 
Sbjct: 817 ERERKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE LRQERARKLREE — 871 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRT 189 
E L R++++ +L +E++ RE EQL E+ + R R L + E L+ 
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Sbjct: 


872 


EQLLRQEEQ ELRQERDRKLREEEQLLRQEEQEL — RQERDRKLREEE-QLLQESEEE 


925 


Query: 


190 










+ Q R L 




Sbjct: 


926 


RLRRQERERKL 936 




Score 


= 96 


\IH.H oics) f tiXpect =■ q.ie+uu, f = y.oe-ui 




Identities = 


- 41/132 (31%), Positives « 69/132 (52%) 




Query: 


46 


KDKDQEDYFQKGGLQI-KFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 


104 






ttt yh t + y+ + T"ryjj toy t t t+ k yb +tti 




Sbjct: 


473 


RERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRWQL QEE 


529 


Query: 


105 




164 






++R +A Q QE+LR+ E+L RE++R+ E+E+E E Q ED +RR 




Sbjct: 


530 


AQRRRHTLYAKPGQ — QEQLREE — E ELQREKRRQ EREREYREEEKLQREEDEKRR 


581 


Query: 


165 










++R+LE+ 




Sbjct: 


582 


RQERERQYRELEE 594 




Score 


= 96 


(14.4 DltS) , bXpeCt = 4.le+UU, r 13 9,oe~Ul 




Identities « 


» 35/138 (25%), Positives « 76/138 (55%) 




Query: 


28 


DRRFPKKWERPVAESL-GHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW 


86 






+R++ + E EL K +++E Q+ + ++ L Q+ + ++E 




Sbjct: 


586 


ERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE-L 


644 


Query: 


87 


EEEr GREMRRyLWL EEEEMWQQRQKKWALLEyEHQEKLRQwNLEDLAR 


143 






+E R++R + L EE+E+ Q+R++K L +E Q L++ E L R+++ R +L 




Sbjct: 


645 


RQERERKLREEEQLLRREEQELRQERERK LREEEQ-LLQEREEERLRRQERAR--KL 


698 


Query: 


144 


drpArcDDDCDrnT rcnuPDDT i £R 
LftLUtiOrKntr t.yij^e,UV&KK± 100 








+E++ R+E ++L + + ER++ 




Sbjct: 


699 


REEEQLLRQEEQELRQERERKL 720 




Score 


= 95 


(14. j oics), Expect = D.^e+oo, r = y.9e-oi 




Identities = 


= 59/282 (20%), Positives - 121/282 (42%) 




Query: 


20 


EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 


79 






E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 




Sbjct: 


655 


EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 


714 


Query: 


80 


ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 


138 






E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 




Sbjct: 


715 


ERERKLREEE — QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 


772 


Query; 


139 


RWVQLEKEQESPRREPEQLG-EDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ— S 


195 






+ + E+EQ RE E+L ++ ER++ ++ E+ L + + Q 




Sbjct: 


773 


KF--REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKLREEEQLLQ 


830 


Query: 


196 


RRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 


255 






R + ++ L ++ + E R R ++ +R+ 




Sbjct: 


831 


EREEERLRRQERERKLREEEQLLRQE-EQELRQERARKLREEEQLLRQEEQELRQERDRK 


889 


Query: 


256 


LQISPANIKKKVYHMDMEAQRK NLQLLSEESELRLPHYLRSKAL 299 








L+ ++++ + E RK QLL E E RL R + L 




Sbjct: 


890 


LREEEQLLRQEEQELRQERORKLREEEQLLQES EEE RLRRQERERKL 936 





Score = 94 (14.1 bits), Expect «* l.le+00, P - 6.8e-01 
Identities = 35/116 (30%), Positives = 59/116 (50%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEK L 124 

E +R++ E Q EE+ R+ R + + EEE++ Q+R+++ L QE K L 
Sbjct: 977 ERARKLREEEQLLRREEQELRQERDRKFREEEQLLQEREEE-RLRRQERDRKFREEERQL 1035 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R+ LE+ R+++ R +LE EQ +E +QL R F + R ++ E L 

Sbjct: 1036 RRQELEEQFRQERDRKFRLE-EQIRQEKEEKQLRRQERDRKFREEEQQRRRQEREQQL 1092 

Score = 94 (14.1 bits), Expect = l.le+00, P - 6.8e-01 
Identities - 51/166 (30%), Positives = 76/166 (45%) 

Query: 67 KQL S LE S S RQV TSESQ — EE PW EEE FGREMR- RQ LWLE EE EMWQQRQKKH ALLEQEHQEK 123 

++L E R+ E Q +E EE R+ R R+L EEE++ + Q++ L QE+ 
Sbjct: 1250 QELRRERDRKFREEEQLLQEREEERLRRQERARKLREEEEQLLFEEQEEQRL RQER 1305 

Query: 124 LRQWNLED-LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R++ E+ ARE++ R +LE+E R+E EQ R F R E+ E 

Sbjct: 1306 DRRYRAEEQFAREEKSR— RLEREL RQEEEQRRRRERERKFREEQLRRQQEE-EQRR 1359 
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Query: 183 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 232 

R QSRR L P T+Q A R E+ R++ P 

Sbjct: 1360 RQLRERQFREDQSRRQVL — EPGTRQFARVPVRSSPLYEYIQEQRSQYRP 1407 

Score - 93 (14.0 bits), Expect «= 8.3e+00, P - 1.0e+00 
Identities « 41/145 (28%), Positives = 72/145 (49%) 

Query: 28 DRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW- 86 

+RR ++ER+E ++Q++Q+ L R + QE+ + 

Sbjct: 408 ERRQRQERERELEEQARRQQQWQAEEESERRRQ-RLSARPSLRERQLRAEERQEQEQRFR 466 

Query: 87 



-EEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEKLRQWNLEDLAREQQRRWVQ 142 
EEE RE R++L +LEEEE Q+R++ L E++ +++ R+ ++ Q RW Q 
Sbjct: 4 67 EEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRW-Q 525 

Query: 143 LEKEQESPRR EP EQLGEDVE 162 

L++E + R +P EQL E+ E 

Sbjct: 526 LQEEAQRRRHTLYAKPGQQEQLREEEE 552 

Score = 91 (13.7 bits), Expect - 2.4e+00, P = 9.1e-01 
Identities = 38/110 (34%), Positives = 57/110 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL- 129 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 988 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 180 

++L +E+ R++ E+EQ RE E+L R F R L + EL 

Sbjct: 989 LRREEQELRQERDRKF — REEEQLLQEREEERLRRQERDRKFREEER — QLRRQEL 1040 

Score = 89 (13.4 bits), Expect - 2.2e+00, P = 8.9e-01 
Identities = 35/138 (25%), Positives =• 65/138 (47%) 

Query: 82 QEEPWEEEFGREMRRQLWLEEEEM--WQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRR 139 

Q E++ E+R + + +E E WQ+++++ L E+E Q K R+ + +R+ + + 
Sbjct: 111 QNRRQEDQRRFELRDRQFEDEPERRRWQKQEQERELAEEEEQRKKRERFEQHYSRQYRDK 170 

Query: 140 WVQLEKEQ-ESPRREPEQL GEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ 194 

+L++++ E RE EQL GDEF +RE+EL Q + 

Sbjct: 171 EQRLQRQELEERRAEEEQLRRRKGRDAEE— FIEEEQLRRREQQELKR-ELREEEQQRRE 227 



Query: 195 SRRPHLPMSPSTQQPALGKQR 215 

R H ++ L ++R 

Sbjct: 228 RREQHERALQEEEEQLLRQRR 248 



Score = 50 
Identities 



(7.5 bits), Expect = 2.2e+00, P = 8.9e-01 
= 34/160 (21%), Positives « 67/160 (41%) 



Query: 325 RLQS LRQE A INHVQIMKETEASY KAQN LY I FLEN I DRL - QS L RLQAWT DKQKG L E E KH RE 383 

R + R+E Q+ +E E + + LE +R Q LR + ++++ E++ R 

Sbjct: 245 RQRRWREEPREQQQLRRELEEIREREQR LEQEERREQQLRREQRLEQEERREQQLRR 301 

Query: 384 CLSSMVTMFPKLQLEWNVHLNIP-EVTSPKPKKCKLPAASPRHIRPSGPTYKQPFLSRHR 4 42 

L + +L+ E + E + K +L R R ++ L+ 

Sbjct: 302 ELEEIREREQR1EQEERREQRLEQEERREQQLKRELEEIREREQRLEQEERREQLLAEEV 361 

Query: 443 ACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASLPRDQ 484 

+ AR++G+ + W+ ++ S + A + K S PR Q 
Sbjct: 362 R EQARERGESLTRRWQRQLESEAGARQSKV-YSRPRRQ 398 

Score - 40 (6.0 bits), Expect 1.9e-01, P = 1.7e-01 
Identities = 32/115 (27%), Positives » 47/115 (40%) 

Query: 27 6 RKNLQLLSEESELRLPHYLRSKAL— ELTTTTMELGALRLQYLCHKYIFYRRL-QSLRQE 332 

R+ QLL E E RL R++ L E E LR Q K+ +L Q +E 

Sbjct: 959 REEEQLLQEREEERLRRQERARKLREEEQLLRREEQELR-QERDRKFREEEQLLQEREEE 1017 



Query: 333 AINHVQI MKETEASYKAQNLYI-FLENIDRLQSLRLQAWTDKQ-KGLEEKHRE 383 

+ + +EE +QL F+DR LQ +K+ K L + R+ 

Sbjct: 1018 RLRRQERDRKFREEERQLRRQELEEQFRQERDRKFRLEEQIRQEKEEKQLRRQERD 1073 

Score - 37 (5.6 bits), Expect - 1.6e+00, P = 7.9e-01 
Identities = 27/108 (25%), Positives - 43/108 (39%) 

Query: 276 RKNLQLLSEESELRLPHYLRSKAL ELTTTTMELGALRLQYLCHKYI FYRRLQSLRQE 332 

R+ QLL E E RL R + L E E LR Q K R+LQE 
Sbjct: 775 REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKL REEEQLLQE 831 

Query: 333 AINHVQIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRE 383 

+EE ++ + E L+R+ ++++ L ++ +E 
Sbjct: 832 REEERLRRQERERKLREEEQLLRQEE-QELRQERARKLREEEQLLRQEEQE 881 
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Pedant information for DKFZphtes3_50n23, frame 1 



Report for DKFZphtes3_50n23 . 1 

(LENGTH] 499 

[MW] 58885.69 

(pi) 9.67 

[KWJ All_Alpha 

[KW] LOW_COMPLEXITY 10.42 % 

SEQ MTVRSRVADVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQ 

SEG 

PRD ccccccceeecccccccccceeeccccccccccccchhhhhhhcccccccccccccccce 

SEQ IKFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEH 

SEG XXXXXXXXXX . . xxxxxxxxxxxxxxxxxxx 

PRD eeeecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAEL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeccccccchhhhhhhh 

SEQ SLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSAS 

SEG xxxxxxxxxxxxxxx. . . 

PRD hccccccchhhhhccccccccccccccccccccccccceeeeeeccccccccccccceee 

SEQ FPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRKNLQLLSEESELRLPHYLRSKALE 

SEG xxxxxxxx 

PRD ecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LTTTTMELGALRLQYLCHKYIFYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENID 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLQSLRLQAWTDKQKGLEEKHRECLSSMVTMFPKLQLEWNVHLNIPEVTSPKPKKCKLPA 

SEG 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccccccccccc 

SEQ ASPRHIRPSGPTYKQPFLSRHRACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASL 

SEG 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccc 

SEQ P RDQLRGH PDIPRLLTLDV 

SEG 

PRD ccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_50n23 . 1) 
<No Pfam data available for DKFZphtes3_50n23 . 1) 
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DKFZphtes3_6b21 



group: testes derived 

DKFZphtes3_6b21 encodes a novel 781 amino acid protein without similarity to human KIAA0256 
gene product . 

No informative BLAST results; No predictive prosite, pfam or SCOP rootife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to KIAA0256 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: /map=*"356.3 cR from top of Chr9 linkage group" 
Insert length: 3360 bp 

Poly A stretch at pos. 3314, polyadenylation signal at pos. 3300 



1 GGCAAGCCGA CGGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 
51 CTCGCGGCAT GGCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 
101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 
151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TGTCTTCCCC AGCTCTGCAG 
201 CCACATACTA TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 
251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 
301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 
351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA TGATGAGAAA AAAACGTATG 
401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 
451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 
501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 
551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 
601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 
651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 
701 TAAGAGAAGT AGTAAAACCA GCTGCAGTGT TATCAAAGGG TGAAATAGTG 
751 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 
801 TCCTTCATGT ACAAGAGAGT TATCTTGGAC ACCAATGGGT TATGTTGTTC 
851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 
901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 
951 TATACCATCT TCTGAAGCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 
1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 
1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 
1101 TACATCAAAA TATGAAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 
1151 ATGCCGAGGA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAGAGACAGA 
1201 AT AG AG AC AC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 
1251 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 
1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 
1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 
1401 ATGTGCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 
1451 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 
1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA CTGAAGAAGA TTATTTTGAA 
1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 
1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 
1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 
1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 
1751 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 
1801 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 
1851 TAAAGAAGTG GATGCTTGTG TTACCGACCT ACTCAAAGAA CTGGTCCGTT 
1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 
1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 
2001 ACTGAAATGT GTCATTATTT CTCCCAACTG TGAGAAGATA CAGTCAAAAG 
2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 
2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 
2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TATGATGGGG 
2201 CCCAGGATCA GTTCCACAAG ATGGTTGAGC TGACAGTGGC GGCCCGACAG 
2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 
2301 CAGGCCTCAG GCACCTCCCA GCCTACCCAC ACAGGGCCCC AGCTGCCCTG 
2351 CAGAAGATGG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 
2401 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATGTA CCCTGGAGCT 
24 51 AGAAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 
2501 GAGAGTTCTT GCCTGTGTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 
2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 
2601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 
2651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATGTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GGCTTTCTGA 

2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTGATACT TAGAATACTT 

2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATAGCTG AGCCAAGTGA 

2851 GTGAGTTTGC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 

2901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 

2951 GGCAGAAGGG TGCAGGGCTG CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 

3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 

3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 

3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 

3151 AATGTGTATT GAAGTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 

3201 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 

3251 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 

3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AAAAAAAAAA 



BLAST Results 



Entry HS773347 from database EMBL: 
human STS WI-18160. 
Score = 813, P = 2.9e-30, identities = 167/171 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 157 bp to 2499 bp; peptide length: 781 
Category: similarity to known protein 



1 MVRVLRSMCL PQLCSHILSV CSGTTSDRNV YSVPGSQYLY NQPSCYRGFQ 
51 TVKHRNENTC PLPQEMKALF KKKTYDEKKT YDQQKFDSER ADGTISSEIK 
101 SARGSHHLSI YAENSLKSDG YHKRTDRKSR IIAKNVSTSK PEFEFTTLDF 
151 PELQGAENNM SEIQKQPKWG PVHSVSTDIS LLREWKPAA VLSKGEIVVK 
201 NNPNESVTAN AATNSPSCTR ELSWTPMGYV VRQTLSTELS AAPKNVTSMI 
251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHXIHPTQK SKASQGSDLE 
301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 
351 TPKFQSKQQP QDNFKNNVKK SQLPVQLDLG GMLTALEKKQ HSQHAKQSSK 
401 PWVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LMKKGKQREI 
451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSDDTQD GESGGDDQFP 
501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPNHTTFPK 
551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQDRMYQKD PVKAKTKRRL 
601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT IIDYACEQNI 
651 PFVFALNRKA LGRSLNKAVP VSWGIFSYD GAQDQFHKMV ELTVAARQAY 
701 KTMLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHYIEI 
751 WKKHLEAYSG CTLELEESLE ASTSQMMNLN L 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_6b21, frame 1 

SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256., N - 1, Score - 
786, P = 3.6e-78 

TREMBL : PFMAL3P3_15 gene: "MAL3P3 . 15"; Plasmodium falciparum MAL3P3 , N 
« 2, Score « 161, P = 5.1e-10 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
» 1, Score => 150, P = 9.1e-07 



>SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 
Length - 635 

HSPs: 

Score = 786 (117.9 bits), Expect = 3.6e-78, P » 3.6e-78 
Identities = 190/424 (44%), Positives = 263/424 (62%) 
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Query: 369 KKSQLPVQLDLGGMLTALEKKQHSQHAKQ — SSKPWVSVGAVPVLSKECASGERGRRMS 426 

KK++ PVQLDLG ML ALEK+Q + A+Q +++P+ +V + ++ + S 

SbjCt: 16 KKNKTPVQLDLGDMLAALEKQQQAMKARQITNTRPLSYTWTAASFHTKDSTNRKPLTKS 75 

Query: 427 Q-MKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVSPAFTS 485 

Q T N +D ++ KKGK++EI K K+PT+LKK+ILKER+E+K RL + S 
Sbjct: 76 QPCLTSFNSVDIASSKAKKGKEKEIAKLKRPTALKKVILKEREEKKGRLTVD— HNLLGS 133 

Query: 486 DDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPG — TELQRDTEASHL — 541 

++ + D P++ G+ + S S+ S+ P T + + + AS 

Sbjct: 134 EEPTEMHLDFIDDLPQEIVSQEDTGLS-MPSDTSLSPASQNSPYCMTPVSQGSPASSGIG 192 

Query: 542 APN-HTTFFKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 600 

+P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+YQKDPV+AK +RRL 
Sbjct: 193 SPMASSTITKIHSKRFREYCNQVLCKEIDECVTLLLQELVSFQERIYQKDPVRAKARRRL 252 

Query: 601 VLGLREVLKHLKLKKLKCVIISPNCEKIQSKGGLDDTLHTIIDYACEQNIPFVFALNRKA 660 

V+GLREV KH+KL K+KCVIISPNCEKIQSKGGLD+ L+ +1 A EQ I PFVFAL RKA 
Sbjct: 253 VMGLREVTKHMKLNKIKCVIISPNCEKIQSKGGLDEALYNVIAMAREQEIPFVFALGRKA 312 

Query: 661 LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRP 717 

LGR +NK VPVSVVGIF+Y GA+ F+K+VELT AR+AYK M+ ++QE E 
Sbjct: 313 LGRCVNKLVPVSVVGIFNYFGAESLFNKLVELTEEARKAYKDMVAAMEQEQAEEALKNVK 372 

Query: 718 QAPPSLP-TQGPS CPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTL ELE 766 

+ P + ++PS C P+EEYW++EG EE 

Sbjct: 373 KVPHHMGHSRNPSAASAISFCSVISEP--ISEVNEKEYETNWRNMVETSDGLEASENEKE 430 

Query: 767 ESLEASTSQ 775 

S + STS+ 
Sbjct: 431 VSCKHSTSE 439 



Pedant information for DKFZphtes3_6b21, frame 1 



Report for DKFZphtes3_6b21 . 1 



[LENGTH] 781 

[MW] 87393.44 

[pi] 8.94 

[H0MOL] SWISSPROT:Y256_HOMAN HYPOTHETICAL PROTEIN KIAA0256. 4e-75 

[PROSITE] MYRISTYL 4 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 16 

[PROSITE] TYR_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 16 

[ PROS I TE ] ASN_GL YCOS YLATION 6 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 8.45 % 



SEQ MVRVLRSMCLPQLCSHILSVCSGTTSDRNVYSVPGSQYLYNQPSCYRGFQTVKHRNENTC 

SEG 

PRD ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

SEQ PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHHLSIYAENSLKSDG 

SEG xxxxxxxxxxxx 

PRD cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhhhhcccceeeeeeeecccccc 

SEQ YHKRTDRKSRIIAKNVSTSKPEFEFTTLDFPELQGAENNMSEIQKQPKWGPVHSVSTDIS 

SEG 

PRD cccccchhhhheeeccccccccceeecccccccccccchhhhhhccccccccceeecchh 

SEQ LLREWKPAAVLSKGEIVVKNNPNESVTANAATNSPSCTRELSWTPMGYVVRQTLSTELS 

SEG 

PRD hhhhhhheeeeecccceeeeccccceeeeeecccccccceeeeeccceeeeeeccccccc 

SEQ AAPKNVTSMINLKTIASSADPKNVSIPSSEALSSDPSYNKEKHIIHPTQKSKASQGSDLE 

SEG 

PRD ccccceeeeehhhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

SEQ QNEASRKNKKKKEKSTSKYEVLTVQEPPRIEDAEEFPNLAVASERRDRIETPKFQSKQQP 

SEG . . . .xxxxxxxxxxxxxx 

PRD hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

SEQ QDNFKNNVKKSQLPVQLDLGGMLTALEKKQHSQHAKQSSKPVVVSVGAVPVLSKECASGE 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeecccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



RGRRMSQMKTPHNPLDSSAPLMKKGKQREI PKAKKPTSLKKI ILKERQERKQRLQENAVS 
chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhcc 
PAFTSDDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPGTELQRDTEASH 
ccccccccccccccccccchhhhhhcccccceeeeccccccccccccccccccccccccc 
LAPNHTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 
ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhh 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



VLGLREVLKHLKLKKLKCVIISPNCEKIQSKGGLDDTLHTIIDYACEQNIPFVFALNRKA 

xxxxxxxxxx 

hhhhhhhhhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhcccceeeeccccc 

LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRPQAP 

cccccccceeeeeeeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

PSLPTQGPSCPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

xxxxxxxxxxxxx 

cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 

L 

c 



Prosite for DKFZphtes3_6b21 . 1 



PS00001 
PS00001 
PS00001 
PS0O001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSOO0O5 
PS00005 
PSOO0O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 

psooooe 

PS00006 
PS00006 
PSO00O6 
PS00006 
PS00006 
PS000O6 
PS00006 
PS00007 
PS00007 
PS00007 
PS00007 
PS00008 
PS0O008 
PS0O008 



135->139 
159->163 
204->208 
245->249 
263->267 
544->548 
71->75 
423->427 
454->458 
26->29 
51->54 
88->91 
101->104 
115->118 
125->128 

138- >141 
288->291 
305->308 

316- >319 
343->346 
351->354 
398->401 
458->461 
553->556 
596->599 

24->28 
74->78 

139- >143 
146->150 
193->197 
257->261 
297->301 

317- >321 
323->327 
384->388 
484->488 
493->497 
506->510 
519->523 
€40->644 
702->706 
581->588 
740->748 
740~>748 

73->82 
93->99 
155->161 
380->386 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S ITE 

CAMP_PHOSPHO_SITE 

CAMP PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PRC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

C K2_PHOS PHO_S ITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_S ITE 

TYR~PHOSPHO~SITE 

TYR_PHOSPHO_SITE 

TYR_PHOS PHO_S ITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 633->639 MYRISTYL PDOC00008 
PSO0009 421->425 AMI DAT I ON PDOC00009 

(No Pfam data available for DKFZphtes3_6b21 . 1) 
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DKFZphtes3_6cll 



group: signal transduction 

DKFZphtes3_6cll encodes a novel 1025 amino acid protean with similarity to A. ambisexualis 
antheridiol steroid receptor. 

The novel protein is a putative steroid receptor. It shares similarity with yeast YNL132w and 
contains the ATP/GTP-binding site motif A (P-loop) and RGD site, similar to the A. 
ambisexualis antheridiol steroid receptor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this receptor. 



strong similarity to YNLl32w 

strong similarity to S.pombe/YDK9 SCHPO, S . cerevisiae/YNL132w, 
C.elegans/F55A12.8 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3966 bp 

Poly A stretch at pos. 3890, polyadenylation signal at pos. 3873 



1 GCTGTGCCTT CTCTTTCGGA 
51 CTCCACTGGC TGGGATCCCC 
101 CATGCATCGG AAAAAGGTGG 
151 GAGTAGCTGA GCGGCAAAGA 
201 AAAGATCAGG TGGTAATACT 
251 GGCTCGGCCT TCAGTGCTGT 
301 GTCACCGGAA GAAAAGAATG 
351 ACACTGAACA TAAAGCAGGA 
401 AAACATTCGC TACTGCTACT 
451 CCTTCGGCAT GTGTGTGCTG 
501 CTGGCCAGGA CTGTAGAAAC 
551 CCTACGGACC ATGAACTCAC 
601 TGCATTCCAG GTACAGAACT 
651 AATGAAAGGT TTATTCTGTC 
701 TGATGACCAG CTCAACATCC 
751 AGGCCCTGCC TCCCCAGACT 
801 GAGCTGAGGG AGTTGAAGGA 
851 GTTGGTGGAC TGCTGTAAGA 
901 TTATCGAGGG CATCTCTGAA 
951 GCTGCTCGAG GACGGGGAAA 
1001 GGCGGTGGCA TTTGGGTACT 
1051 ATAACCTCCA TACTCTGTTT 
1101 CAATATCAGG AACATCTGGA 
1151 ATTTAACAAA GCAGTGATCA 
1201 CTATTCAGTA TATACATCCT 
1251 CTAGTTGTGA TTGATGAAGC 
1301 CCTACTTGGC CCCTACCTTG 
1351 AGGGCACTGG CCGGTCACTG 
1401 CAGAGCGCCC AGAGCCAGGT 
1451 GACAGCCAGA TTGGCATCAG 
1501 AGTCAATCCG ATACGCCCCT 
1551 TTGCTGTGCC TGGATTGCCT 
1601 CTTGCCTGAA GCTTGTGAAC 
1651 GCTACCACAA GGCCTCTGAA 
1701 GTGGCTTCTC ACTACAAGAA 
1751 TGCACCTGCT CACCATCTCT 
1801 AGAATGCCCT TCCAGAAGTG 
1851 GAGATTTCTC GCCAGTCCAT 
1901 TTCAGGGGAC CTGATTCCAT 
1951 ACTTTGGTGG TCTGTCTGGT 
2001 GATTATCAAG GGATGGGCTA 
2051 GTACTATGAA GGCAGGTTTC 
2101 CACAGGAAAT TCACACCGTA 
2151 GTCATCACTC CCCGGAAGGA 
2201 GAGGCCTGCC GAACGCCTGG 
2251 CCAGGCTCCT CAAGTTCTGG 
2301 AGACAGACCC CGAATGACCT 
2351 GACGCTCACT GATGAGGATG 
2401 TCTGGAAAGA TTTCCGACGG 
2451 AGTACCTTCT CTCCTTCCCT 
2501 GGGGAAGCCA GCCCAGCCTG 



GTTGTTCCGT GCTCCCACGT GCTTCCCCTT 
CGGGCTCGGG GCGCAGTAAT AATTTTTCAC 
ATAACCGAAT CCGGATTCTC ATTGAGAATG 
TCTCTCTTTG TTGTAGTTGG GGATCGAGGA 
TCATCACATG TTATCCAAAG CAACTGTGAA 
GGTGTTATAA GAAAGAGCTG GGGTTTAGCA 
CGACAGCTGC AGAAGAAAAT AAAGAATGGA 
CGACCCCTTT GAACTCTTCA TAGCAGCCAC 
ACAACGAGAC CCACAAGATC CTGGGCAATA 
CAGGATTTTG AAGCCTTAAC TCCAAACTTG 
AGTGGAAGGT GGTGGGCTAG TGGTCATCCT 
TCAAGCAATT GTACACAGTG ACTATGGATG 
GAGGCCCATC AGGATGTGGT GGGAAGATTT 
TCTGGCCTCT TGTAAGAAGT GTCTCGTCAT 
TGCCCATCTC CTCCCACGTT GCCACCATGG 
CCGGATGAGA GTCTTGGTCC TTCTGATCTG 
GAGCTTGCAG GACACCCAGC CTGTGGGTGT 
CTCTAGACCA GGCCAAAGCT GTCTTGAAAT 
AAGACCCTGA GGAGTACTGT TGCACTCACA 
ATCTGCAGCC CTGGGATTGG CGATTGCTGG 
CCAATATCTT TGTTACCTCC CCAAGCCCTG 
GAATTTGTAT TTAAAGGATT TGATGCTCTG 
TTATGAGATT ATCCAGTCTC TAAATCCTGA 
GAGTGAATGT ATTTCGAGAA CACAGGCAGA 
GCAGATGCTG TGAAGCTGGG CCAGGCTGAA 
TGCCGCCATC CCCCTCCCCT TGGTGAAGAG 
TTTTCATGGC ATCCACCATC AATGGCTATG 
TCCCTCAAGC TAATTCAGCA GCTCCGTCAA 
CAGCACCACT GCTGAGAATA AGACCACGAC 
CGCGGACACT GCATGAGGTT TCCCTCCAGG 
GGGGATGCAG TGGAGAAGTG GCTGAATGAC 
CAACATCACT CGGATAGTCT CAGGCTGCCC 
TGTACTATGT TAATAGAGAT ACCCTCTTTT 
GTTTTCCTCC AACGGCTTAT GGCCCTCTAC 
CTCTCCCAAT GATCTCCAGA TGCTCTCCGA 
TCTGCCTTCT GCCTCCTGTG CCCCCCACCC 
CTTGCTGTTA TCCAGGTGTG CCTTGAAGGG 
CTTGAACAGT CTGTCTCGAG GCAAGAAGGC 
GGACAGTGTC AGAACAGTTC CAAGATCCAG 
GGAAGGGTCG TTCGCATTGC TGTTCACCCA 
TGGCAGCCGT GCTCTGCAGC TGCTGCAGAT 
CTTGTCTGGA GGAAAAGGTC CTTGAGACAC 
AGCAGCGAGG CTGTCAGCTT GTTGGAAGAG 
CCTGCCTCCT TTACTCCTCA AATTGAATGA 
ATTACCTGGG TGTTTCCTAT GGCTTGACCC 
AAACGAGCTG GATTTGTTCC TGTTTATCTG 
GACCGGAGAG CACTCGTGCA TCATGCTGAA 
AGGCTGACCA GGGAGGCTGG CTTGCAGCCT 
CGGTTCCTAG CCTTGCTCTC CTACCAGTTC 
GGCTCTGAAC ATCATTCAGA ACAGGAACAT 
CCCTGAGCCG GGAGGAGCTG GAAGCACTCT 
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2551 TCCTCCCCTA TGACCTGAAG 

2601 GACTATCACC TCATCATGGA 

2651 CCTGAACCAG CTGGGGGACC 

2701 TCTTGGGGAT TGGCCTGCAG 

2751 ATTGAGCTGC CCTCGGGCCA 

2801 CAAAGTTGTG AAGCTATTTA 

2851 AGATGGTGGC AGCGAAGGAT 

2901 AGTGACGACC TAGATGAAGC 

2951 GGAAGTAGGG AAGCTGAAGA 

3001 GGGACGATGA AGAGTGGAAT 

3051 TCGATCATCA GCCTGAAAAG 

3101 AGAACCCAAA CAGAGCAAGA 

3151 AAGATATGAA ACTGAAGCGG 

3201 GTGTTTGATC ATGGGAAGAT 

3251 ACTGTTAAAA GCAACGAGAG 

3301 TTCGGCCTCT GGGCCTGTGT 

3351 GTCACTCCCA AATGGGTCTC 

3401 TCTAGAATTG CCACGAGTCT 

3451 TTCCTATAAG TTCATATTTT 

3501 ACACATGTGG AAGCCACGTT 

3551 ATCGCTTTCT GGTGGTGCCC 

3601 CTTTGTGGAC TTGTACCTGG 

3651 CCATGGCAGC CCGCGGTTAG 

3701 GCTGTTCCAC TCTTGGCTCC 

3751 CCTGTAGTTT ATGTAGAATG 

3801 CCATTTGGGA AAAGATGTTG 

3851 GGAAGGATAG AGAATCTATT 

3901 AAAAAAAAAA AAAAAAAAAA 

3951 AAAAAAAAAA AAAAAA 



CGGCTGGAGA TGTATTCACG GAATATGGTG 
CATGATCCCG GCCATCTCTC GCATCTATTT 
TGGCCCTGTC TGCGGCTCAG TCGGCTCTTC 
CATAAGTCTG TGGACCAGCT GGAAAAGGAG 
GTTGATGGGA CTTTTCAACC GGATCATCCG 
ATGAAGTTCA GGAAAAGGCC ATTGAGGAGC 
GTGGTCATGG AGCCCACGAT GAAGACCCTC 
AGCAAAGGAA TTTCAGGAGA AACACAAGAA 
GCATGGACCT CTCTGAATAC ATAATCCGTG 
GAAGTTTTGA ACAAAGCTGG GCCGAACGCC 
TGACAAGAAA AGGAAGTTAG AGGCCAAACA 
AGTTGAAGAA CAGAGAGACA AAGAACAAAA 
AAGAAATAGT GAAGAGAAAC TCGGGCATCT 
ACTCTCACTA ACTGAACCCT CTCTGGCTGG 
GCCCCGGCAC ACCTGGAAGC TGGCCGCGAA 
GTCTGTGAGC TCAACCTGGC TAAAGGCAGA 
TTTAGAACTT GATGGCTGGG CACTGCCATC 
CTCTCTTCCT GCCCAGTCCA GGGCCCTCCT 
GCTTTGAGCC AGCTTTTTAG TCTCATTCCC 
GCCTCTCGAC CGCCTGAGGC CCTTAAGTAC 
AGGAGGCTGC TGCTGGGCCG CTGGGTCTCT 
AGCAGGAGGA ACTCCAGTCC GTCCCGGCAT 
GTGCGCCAGG GTTTGCTGAT GTTGTCTTGT 
AGCAGACCCA CTGTCCCAGA AAAGCCTGAT 
CCACATCTGC GTCCTCAAGA CCTGTTTCAT 
GGAAAGGCCA CTTTGCTCGC AGGGGTGAGG 
TTTAATAAAT AACATTCTAG AATGAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP_GTP_A (284-292) 



1 MHRKKVDNRI RILIENGVAE 

51 ARPSVLWCYK KELGFSSHRK 

101 NIRYCYYNET HKILGNTFGM 

151 LRTMNSLKQL YTVTMDVHSR 

201 DDQLNILPIS SHVATMEALP 

251 LVDCCKTLDQ AKAVLKFIEG 

301 AVAFGYSNIF VTSPSPDNLH 

351 FNKAVIRVNV FREHRQTIQY 

401 LLGPYLVFMA STINGYEGTG 

451 TARLASARTL HEVSLQESIR 

501 LPEACELYYV NRDTLFCYHK 

551 APAHHLFCLL PPVPPTQNAL 

601 SGDLIPWTVS EQFQDPDFGG 

651 YYEGRFPCLE EKVLETPQEI 

701 RPAERLDYLG VSYGLTPRLL 

751 TLTDEDEADQ GGWLAAFWKD 

801 GKPAQPALSR EELEALFLPY 

851 LNQLGDLALS AAQSALLLGI 

901 KWKLFNEVQ EKAIEEQMVA 

951 EVGKLKSMDL SEYIIRGDDE 

1001 EPKQSKKLKN RETKNKKDMK 



No BLASTP hits available 



RQRSLFVVVG DRGKDQVVIL HHMLSKATVK 
KRMRQLQKKI KNGTLNIKQD DPFELFIAAT 
CVLQDFEALT PNLLARTVET VEGGGLVVIL 
YRTEAHQDW GRFNERFILS LASCKKCLVI 
PQTPDESLGP SDLELRELKE SLQDTQPVGV 
ISEKTLRSTV ALTAARGRGK SAALGLAIAG 
TLFEFVFKGF DALQYQEHLD YEIIQSLNPE 
IHPADAVKLG QAELWIDEA AAIPLPLVKS 
RSLSLKLIQQ LRQQSAQSQV STTAENKTTT 
YAPGDAVEKW LNDLLCLDCL NITRIVSGCP 
ASEVFLQRLM ALYVASHYKN SPNDLQMLSD 
PEVLAVIQVC LEGEISRQSI LNSLSRGKKA 
LSGGRVVRIA VHPDYQGMGY GSRALQLLQM 
HTVSSEAVSL LEEVITPRKD LPPLLLKLNE 
KFWKRAGFVP VYLRQTPNDL TGEHSCIMLK 
FRRRFLALLS YQFSTFSPSL ALNIIQNRNM 
DLKRLEMYSR NMVDYHLIMD MIPAISRIYF 
GLQHKSVDQL EKEIELPSGQ LMGLFNRIIR 
AKDVVMEPTM KTLSDDLDEA AKEFQEKHKK 
EWNEVLNKAG PNASIISLKS DKKRKLEAKQ 
LKRKK 

BLASTP hits 
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Alert BLASTP hits for DKFZphtes3_6cll, frame 3 

TREMBL : CEAF3 130_4 gene: "F55A12.8"; Caenorhabditis elegans cosmic! 
F55A12., N = 1, Score = 2*782, P - l.le-289 

PIR:S55151 probable membrane protein YNLl32w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 2549, P - 3.5e-273 

SWISSPR0T:YXX1_ACHAM HYPOTHETICAL PROTEIN (FRAGMENT)., N = 1, Score =■ 
1013, P « 3.2e-102 

SWISSPROT: YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN 
CHROMOSOME I., N = 1, Score » 2843, P = 3.8e-296 



>SWISSPROT:YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN CHROMOSOME 
I. 

Length = 1,033 

HSPs: 

Score « 2843 (426.6 bits), Expect = 3.8e-296, P « 3.8e-296 
Identities » 576/1033 (55%), Positives 750/1033 (72%) 

Query: 1 MHRKKVDNRI RI LI ENGVAERQRSLFV WGDRGKDQVVI LHHMLSKATVKARPSVLWC YK 60 

M +K +D+RI LI+NG 'E+QRS FVVVGDR +DQVV LH +LS++ V ARP+VLW YK 
Sbjct: 1 MPKKALDSRIPTLIKNGCQEKQRSFFVVVGDRARDQVVNLHWLLSQSKVAARPNVLWMYK 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFG 119 

K+L GF+SHRKKR +++K+IK G + +DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKIKKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPNLLARTVETVEGGGLWILLRTMNSLKQLYTVTMDVHSRYRTEAHQDV 179 

M VLQDFEALTPNLLART+ETVEGGG+W+LL +NSLKQLYT++MD+HSRYRTEAH DV 
Sbjct: 121 MLVLQDFEALTPNLLARTIETVEGGGIWLLLHKLNSLKQLYTMSMDIHSRYRTEAHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELK 239 

RFNERFILSL +C+ CLVIDD+LN+LPIS ++ALPP +++ + ++EL+ 

Sbjct: 181 TARFNERFILSLGNCENCLVIDDELNVLPISGG-KNVKALPPTLEEDN — STQNSIKELQ 237 

Query: 240 ESLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIA 299 

ESL + P G LV KTLDQA+AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAI A 
Sbjct: 238 E SLG E DH P AG ALVGVT KT L DQARA VLT FVE S I V E K S LKGT V S LT AGRG RG KS AALG L A I A 2 97 

Query: 300 GAVAFGYSNI FVTSPS PDNLHTLFEFVFKGFDALQYQEHLDYEI IQSLNPEFNKAVIRVN 359 

A+A GYSNIF+TSPSP+NL TLFEF+FKGFDAL Y+EH+DY+IIQS NP ++ A++RVN 
Sbjct: 298 AAIAHGYSNI FITSPSPENLKTLFEFIFKGFDALNYEEHVDYDI IQSTNPAYHNAIVRVN 357 

Query: 360 VFREHRQTIQYIHPADAVKLGQAELWIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 

+FR+HRQTIQYI P D+ LGQAELVVI DEAAAI PLPLV+ L+GPYLVFMASTINGYEGT 
Sbjct: 358 IFRDHRQTIQYISPEDSNVLGQAELVVIDEAAAIPLPLVRKLIGPYLVFMASTINGYEGT 417 

Query: 420 GRSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEK 479 

GRSLSLKL+QQLR+QS S + NK+ + + + S RTL E+SL E IRYA GD +E 

Sbjct: 418 GRSLSLKLLQQLREQSRI — YSGSGNNKSDSQSHI-SGRTLKEISLDEPIRYAMGDRIEL 474 

Query: 480 WLNDLLCLDCLN-ITRIVS-GCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASH 537 

WLN LLCLD + ++R+ + G P P C LY V+RDTLF YH SE FLQR+M+LYVASH 
Sbjct: 475 WLNKLLCLDAASYVSRMATQGFPHPSECSLYRVSRDTLFSYHPISEAFLQRMMSLYVASH 534 

Query: 538 YKNSPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRG 597 

Y KNS PN DLQ+ + S DAP AH LF LLPPV LP+ + VIQ+ LEG ISR+SI+NSLSRG 

Sbjct: 535 YKNSPNDLQLMSDAPAHQLFVLLPPVDLKNPKLPDPICVIQLALEGSISRESIMNSLSRG 594 

Query: 598 KKASGDLIPWTVSEQFQDPDFGGLSGGRVVRIAVHPDYQGMGYGSRALQLLQMYYEGRFP 657 

++A GDLIPW +S+QFQD +F L G R+VRIAV P++ MGYG+RA+QLL Y+EG+F 
Sbjct: 595 QRAGGDLIPWLISQQFQDENFAALGGARIVRIAVSPEHVKMGYGTRAMQLLHEYFEGKFI 654 

Query: 658 CLEEKVLETPQEIHTVSSEAV SLLEEVITPR — KDLPPLLLKLNERPAERLDYLGVS 712 

E+ ++E+ +LEIRK +PPLLLKL+E E L Y+GVS 

Sbjct: 655 SASEEFKAVKHSLKRIGDEEIENTALQTEKIHVRDAKTMPPLLLKLSELQPEPLHYVGVS 714 

Query: 713 YGLTPRLLKFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFR 772 

YGLTP L KFWKR G+ P+YLRQT NDLTGEH+C+ML+ L D WL AF ++F 

Sbjct: 715 YGLTPSLQKFWKREGYCPLYLRQTANDLTGEHTCVMLRVLEGRDSE WLGAFAQNFY 770 

Query: 773 RRFLALLS YQFSTFSPSLALNI IQNRNMGKP AQPALSREELEALFLPYDLKRLEMY 828 

RRFL+LL YQF F+ AL+++ N G + L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RRFLSLLGYQFREFAAITALSVLDACNNGTKYVVNSTSKLTNEEINNVFESYDLKRLESY 830 
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Query: 829 SRNMVDYHLIMDMIPAISRIYFLNQLGD-LALSAAQSALLLGIGLQHKSVDQLEKEIELP 887 

S N++DYH+I+D++P ++ +YF + D + LS Q ++LL +GLQ+K++D LEKE LP 
Sbjct: 831 SNNLLDYHVIVDLLPKLAHLYFSGKFPDSVKLSPVQQSVLLALGLQYKTIDTLEKEFNLP 890 

Query: 888 SGQLMGLFNRI I RKWKLFNEVQEKAIEEQMVAAKDWME PTMKTLSDDLDE 939 

S QL+ + ++ +K++K +E++ K IEE++ + K P ++L ++L E 

Sbjct: 891 SNQLLAMLVKLSKKIMKCIDEIETKDIEEELGSNKKTESSNSKLPEFTPLQQSLEEELQE 950 

Query: 940 AAKEFQ-EKHKKEVGKLKSMDLSEYI I RGDDEEWNEVLNKAGPNASI I SLKSDKKRKLEA 998 

A E +K+ + ++DL +Y IRG++E+W KA N I R + 

Sbjct: 951 GADEAMLALREKQRELINAIDLEKYAIRGNEEDW— — KAAEN-QIQKTNGKGARVVSI 1004 

Query: 999 KQEPKQSKKL— KNRETKNKKDMKLKRKK 1025 

K E +++ L +++TK K K K +K 
Sbjct: 1005 KGEKRKNNSLDASDKKTKEKPSSKKKFRK 1033 



Pedant information for DKFZphtes3_6cll ( frame 3 



Report for DKFZphtes3_6cll . 3 



[LENGTH] 1025 

[MW] 115704.57 

[pi] 8.50 

[HOMOL] PIR:S55151 probable membrane protein YNLl32w - yeast (Saccharorayces cerevisiae) 
0.0 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YNL132w] 0.0 

[FUNCAT] r general function prediction [H. influenzae, HI1254] 2e-05 

[PROSITE] ATP_GTP_A 1 

[PROSITE] RGD 1 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 11.80 % 



SEQ MHRKKV DNRIRILI ENGV AERQRSLFVWGDRGKDQVV I LHHMLSKATVKARPS VLWC YK 

SEG 

PRD cccccccchhhhhhcccccccceeeeeeeeccccceeeeehhhhhhhhhhccceeehhhh 

SEQ KELGFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFGM 

SEG 

PRD hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

SEQ CVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDVV 

SEG XXXXXXXXXXXXXXX 

PRD eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ GRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELKE 

SEG 

PRD hhhhhhhhhhhcccceeeeeecceeeecccccccccccccccccccccccchhhhhhhhh 

SEQ SLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIAG 

SEG xxxxxxxxx 

PRD hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchhhhhhhhhh 

SEQ AVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVNV 

SEG xxx 

PRD hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhheeeeeccccccceeeeeeh 

SEQ FREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGTG 

SEG 

PRD hhhhhhheeeeccccccccccceeeehhhhhccchhhhhhhccceeeeeeeccccccccc 

SEQ RSLSLKLI QQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQES I RYAPGDAVEKW 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 

SEQ LNDLLCLDCLNITRIVSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKN 

SEG xxxxxxxxxxx 

PRD hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 

SEQ SPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 

SEG 

PRD cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 

SEQ SGDLIPWTVSEQFQDPDFGGLSGGRVVRIAVHPDYQGMGYGSRALQLLQMYYEGRFPCLE 

SEG 

PRD cccchhhhhhhhhhhccccccccceeeeeeccccccccccchhhhhhhhhhhhcccchhh 

SEQ EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVSYGLTPRLL 
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SEG xxxxxxxxxx 

PRD hhhhhccccccchhhhhhhhhhhhhhccccccccccccccccccceeeeccccccchhhh 

SEQ KFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFRRRFLALLS 

SEG 

PRD hhhhhcccceeeeeccccccccceeeeeeecccccccccchhhhhhhhhhhhhhhhhhhh 

SEQ YQFSTFSPSLALNIIQNRNMGKPAQPALSREELEALFLPYDLKRLEMYSRNMVDYHLIMD 

SEG 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhhhhhhhccchhhhhhhh 

SEQ MIPAISRIYFLNQLGDLALSAAQSALLLGIGLQHKSVDQLEKEIELPSGQLMGLFNRIIR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 

SEQ KVVKLFNEVQEKAI EEQMVAAKDVVMEPTMKTLSDDLDEAAKEFQEKHKKEVGKLKSMDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ SEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEAKQEPKQSKKLKNRETKNKKDMK 

SEG xxxxxxxxxxxxxxx 

PRD cceeecccchhhhhhhhhccccceeeeeeccchhhhhhhhcccccccccccccccchhhh 

SEQ LKRKK 

SEG xxxxx 

PRD hhccc 



Prosite for DKFZphtes3_6cll . 3 

PS00016 966->969 RGD PDOC00016 
PS00017 284->292 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphtes3_6cll . 3) 
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DKFZphtes3_6dl6 



group: testes derived 

DKFZphtes3_6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC: H_DJ1 185107 .2 . 

The cDNA is different to the proposed gene model: it contains additional exons. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

WUGSC:H_DJ1 185107 .2, differences to genmodel 

differences to genmodel of WUGSC:H_DJ1 185107. 2 two exons skippt, 

Sequenced by BMFZ 

Locus: /ma p="7ql 1.23^21" 

Insert length: 4572 bp 

Poly A stretch at pos. 4540, polyadenylation signal at pos. 4520 

1 GGCGGCGCTA GCTTCGGAGT CTCCCGCGCG CACCTCAGCC GCCTCCTAGC 
51 GGCGCGGCGC TCGCTCCTAC GCCTAAAATG ACCAATGTGT GATTTCAGTG 

101 GAATAAATGG CGTCCAAAGT CACAGATGCT ATAGTCTGGT ATCAAAAGAA 

151 GATTGGAGCA TATGATCAAC AAATATGGGA AAAATCTGTT GAACAGAGAG 

201 AAATCAAGGG GCTAAGGAAT AAACCAAAGA AAACAGCACA TGTGAAACCA 

251 GACCTCATAG ATGTTGATCT TGTAAGAGGG TCTGCATTTG CAAAGGCAAA 

301 GCCTGAAAGT CCTTGGACTT CTCTGACCAG AAAGGGAATT GTTCGAGTTG 

351 TATTTTTCCC CTTTTTCTTC CGGTGGTGGT TACAAGTAAC ATCAAAGGTC 

401 ATCTTTTTCT GGCTTCTTGT CCTTTATCTT CTTCAAGTTG CTGCAATAGT 

451 ATTATTCTGC TCCACTTCTA GCCCACACAG CATACCTCTG ACAGAGGTGA 

501 TTGGGCCGAT ATGGCTGATG CTGCTCCTGG GAACTGTGCA TTGCCAGATT 

551 GTTTCCACAA GAACACCCAA ACCTCCTCTA AGTACAGGGG GTAAAAGAAG 

601 AAGGAAATTA AGAAAAGCAG CCCATTTGGA AGTACATAGG GAAGGAGATG 

651 GTTCTAGTAC CACAGATAAC ACACAAGAGG GAGCAGTTCA GAACCACGGT 

701 ACAAGCACCT CTCACAGCGT TGGCACTGTC TTCAGAGATC TCTGGCATGC 

751 TGCTTTCTTT TTATCAGGAT CAAAGAAAGC AAAGAATTCA ATTGATAAAT 

801 CAACTGAAAC TGACAATGGC TATGTATCCC TTGATGGGAA GAAGACTGTT 

851 AAAAGCGGTG AAGATGGAAT ACAAAACCAT GAACCTCAGT GTGAAACTAT 

901 TCGACCAGAA GAGACAGCCT GGAACACAGG AACACTGAGG AATGGTCCTA 

951 GCAAAGATAC CCAAAGGACA ATAACAAATG TCTCTGATGA AGTCTCCAGT 
1001 GAGGAAGGTC CTGAAACAGG ATACTCATTA CGTCGTCATG TGGACAGGAC 
1051 TTCTGAAGGT GTTCTTCGGA ATAGAAAGTC ACACCATTAT AAGAAACATT 
1101 ACCCTAATGA GGACGCCCCT AAATCGGGTA CTAGTTGCAG CTCTCGCTGT 
1151 TCAAGTTCCA GACAGGATTC TGAGAGTGCA AGGCCAGAAT CTGAAACAGA 
1201 AGATGTGTTA TGGGAAGACT TGTTACATTG TGCAGAATGC CATTCATCTT 
1251 GTACCAGTGA GACAGATGTG GAAAATCATC AGATTAATCC ATGTGTGAAA 
1301 AAAGAATATA GAGATGACCC TTTTCATCAG AGTCATTTGC CCTGGCTCCA 
1351 TAGTTCCCAC CCAGGATTAG AAAAAATAAG TGCTATAGTA TGGGAAGGTA 
1401 ATGATTGTAA GAAAGCAGAC ATGTCTGTAC TTGAAATCAG TGGAATGATA 
1451 ATGAACAGAG TGAACAGCCA TATACCAGGA ATAGGATACC AGATTTTTGG 
1501 AAATGCAGTC TCTCTCATAC TGGGTTTAAC TCCATTTGTT TTCCGACTTT 
1551 CTCAAGCTAC AGACTTGGAA CAACTCACAG CACATTCTGC TTCAGAACTT 
1601 TATGTGATTG CATTTGGTTC TAATGAAGAT GTCATAGTTC TTTCTATGGT 
1651 TATAATAAGT TTTGTGGTTC GCGTGTCTCT TGTGTGGATT TTCTTTTTTT 
1701 TGCTCTGTGT AGCAGAAAGA ACTTATAAAC AGCGATTACT TTTTGCAAAA 
1751 CTCTTTGGAC ATTTAACATC TGCAAGGAGG GCTCGAAAAT CTGAGGTTCC 
1801 TCATTTCCGG TTGAAGAAAG TACAGAATAT AAAAATGTGG CTATCTCTCC 
1851 GTTCCTATCT TAAGCGTCGA GGTCCTCAGC GATCAGTTGA TGTAATAGTT 
1901 TCATCTGCTT TCTTATTGAC TATCTCAGTT GTATTTATCT GTTGTGCCCA 
1951 GATAAACCTC TACTTGAAAA TGGAGAAAAA ACCTAACAAA AAGGAGGAAC 
2001 TGACACTAGT GAATAATGTT TTAAAACTGG CTACTAAACT GCTAAAGGAG 
2051 TTGGACAGTC CTTTTAGATT ATATGGGCTT ACAATGAATC CGCTGCTTTA 
2101 TAACATCACC CAGGTTGTTA TCCTGTCAGC TGTTTCTGGT GTTATCAGTG 
2151 ACTTGCTTGG ATTTAATTTA AAGCTATGGA AGATTAAGTC ATGACAATTC 
2201 AAAGAAAAGA AGATGTAGCC TCTTTTCCAG AATAAGAGTA CTGACTAAGC 
2251 TGCCTGAAAG CTTGTCACTG ATTCTTTGCT TCAGGAGTCT CAGCTAGGGA 
2301 GTTGAAGTGT TTACATCAGA CTGTCTTGTG CAATTCTTAT ATTTATTTTA 
2351 CTGGTTCACT TTTTTTTACA TTTATTTTAG TCTTTATATT TTTATTTTTA 
2401 AGCATTGATG TACTTAGTTG TTGAAAGGGT GATGAAACTG ATATCCAGAT 
2451 ACTTGAGATC CTGGTAATTG GTCATAAATA ATTGGCAAAA TAACAAATTG 
2501 TGAAAATAGA AGCCATTGCT CAGCACCGTT TCTCCATCAA TGCCGTGAAC 
2551 TTGCCTTACT TGAGGAAAAA TTCTTTAACT TTGGAATATT GCATTGAACT 
2601 CAGCTATACA CATAAAACAT TTTCTTTGGT AAATCAAGAT CCAGTCAGGG 
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2651 TTTCTCTTGA ATTATTTTGG AACAATGCCA GGATCCAAAC TGATTAAGTT 
2701 ACAGTTTAAG CACCCTTCAG TATTAATATA TACGGTATTA TATAACAGGT 
2751 CAACAAGTGC TCTTTGATGA TAAAACTTGT AATAGAGCAA TAATTGTAAA 
2801 TGGTTACCAT ACTGTAAGAT ATTTTGATAA AAATTAACTA GTAATACTTG 
2851 TATTTATTTG AAACACTGGG CTGTTTGCAC AGCTCCAACT GTGCATGCTC 
2901 AAAATGTGCA CTTTTTAAAA TTGTTACTTT TAATGCGTAT CTTTATATGG 
2951 GATCTGTTAT AGTATACTAG GGCATGATAT GGTATCCTTT TGAGTGAGGT 
3001 AT AT ACT CAT CTCACAAGTG AAGTGCCTAC TG AT ATT ACT AAAGTACATT 
3051 ATGTTTACTC AAGTAAATAA TTTTCTCCCC ATGGTACACT CTAGTGTAGG 
3101 CTATTCATAC CACACTGAAA TGAACAACTG AAGAATAAGG CTAAGAACCA 
3151 ATAAAATATT TCTCTAATTG CTAGTTGTAA AACTGTATCC AAATTTTCAG 
3201 AAAAGACAGC TTCAGCTTGC AAATTCTATC CTCTAAACTT ATCTGGTGCA 
3251 TTCTCCCCAC CCCACCCCCA TTATATAAGG GCTATTTTAG ATGCTTTTAA 
3301 CCTCCCCAAC AAATAATTTG CCAAGTGTCC AATGAGAACT TATCATGTTG 
3351 GTGTGTTAGG TAAATCGGGC AAATATGATA GTGTCTTACA TTGGGCCTTG 
3401 ATTTTAAGTT GTTATATTTG TACAATCGAG TATTTTAGAA ATTACATGAA 
3451 ACATGAAACA GTTTTTGCAA TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 
3501 AAATTTATTT GAAACAATCT AGAATTTTCT TGGTGCAAAG TGTATCATGT 
3551 GGAATATCCT CATATTTTTA CCATATTTTA AGAACTTTAA GACGATTAAT 
3601 TGTAAATAAT TTATTTGATT GGTGCAGTTC TAATCCCTAA ATCATAATCT 
3651 TAAAATCAGG AATGTGTGGA GAACAGAGCC ATGTCATATC ACTTTGCTCT 
3701 TACCATTCCT TTTGATCAGC CTCAATTCAG CCTCATTGTG TAGTATGTTT 
3751 TTTCTTTCTA TGAAAAACAA CAGAAAGCAT TTCATTTTAT TTGCCTATGT 
3801 TCAAATATGT TTAATAATGA CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 
3851 TGTAATACTG GAGCTTTAAG AACATACTTA GTTTCTCATG TGAAAACTTA 
3901 GGCTTTGTCT GATGTTTTTC CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 
3951 TTTAGGAATT ATGTTTTATA AACTTTTTCA ATATAAGGTA CATGCCTATA 
4001 CAGAACTTAA CATTTTGCAC AGAATATATC AAATATATTT TGAGAAAAAA 
4051 AGTACGGCAT GAGTTCTGTT AGGAATAAAA GATGAAACTA TTGTATCTCA 
4101 CAAAAAATCT TATTTCAGAA TGGAAATATT TTTGAGAAAA GTAGCTGAGT 
4151 ATACTGGTTT AAGAAAATGC TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 
4201 GGAGTTGATT TATTAAGTAC AGTATACCTC TCAACAGTTT ATAAATAATA 
425.1 TGTTGAATTA TGTCAGTGTG GGCAGCAGTA GAATACTAAA AGGAAAATGT 
4301 CATGTTAAGC AATTTCAGAA CATTAACTGA ACTATTTTCA AAGCAGAAAA 
4 351 ATTGACATTG ' CTGCCTTTAA GAATACCATG AATGTAAGAA ATTGAAAGAA 
4401 ATTGTAAAAT ATCACATAAT ATAGAAATGG CAGTTCAAAG AGAATTGTGG 
4451 CAGATGTTGT GTGTGAACTG TTGTTTCTTT GCCACATGTG TTGTATTTGA 
4501 AAGTTTTACA GTAAGTTTAA AATAAAACAT TCTGTGACTG AAAAAAAAAA 
4551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 2191 bp; peptide length: 695 

Category: known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROME_C (375-381) 



1 MASKVTDAIV WYQKKIGAYD QQIWEKSVEQ REIKGLRNKP KKTAHVKPDL 
51 I DVDLVRGSA FAKAKPESPW TSLTRKGIVR VVFFPFFFRW WLQVTSKVIF 
101 FWLLVLYLLQ VAAIVLFCST SSPHSIPLTE VIGPIWLMLL LGTVHCQIVS 
151 TRTPKPPLST GGKRRRKLRK AAHLEVHREG DGSSTTDNTQ EGAVQNHGTS 
201 TSHSVGTVFR DLWHAAFFLS GSKKAKNSID KSTETDNGYV SLDGKKTVKS 
251 GEDGIQNHEP QCETIRPEET AWNTGTLRNG PSKDTQRTIT NVSDEVSSEE 
301 GPETGYSLRR HVDRTSEGVL RNRKSHHYKK HYPNEDAPKS GTSCSSRCSS 
351 SRQDSESARP ESETEDVLWE DLLHCAECHS SCTSETDVEN HQINPCVKKE 
401 YRDDPFHQSH LPWLHSSHPG LEKISAIVWE GNDCKKADMS VLEISGMIMN 
4 51 RVNSHIPGIG YQIFGNAVSL ILGLTPFVFR LSQATDLEQL TAHSASELYV 
501 IAFGSNEDVI VLSMVIISFV VRVSLVWIFF FLLCVAERTY KQRLLFAKLF 
551 GHLTSARRAR KSEVPHFRLK KVQNIKMWLS LRSYLKRRGP QRSVDVIVSS 
601 AFLLTISVVF ICCAQINLYL KMEKKPNKKE ELTLVNNVLK LATKLLKELD 
651 SPFRLYGLTM NPLLYNITQV VILSAVSGVI SDLLGFNLKL WKIKS 

BLAST P hits 
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No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_6dl6, frame 2 

PIR:S38170 SRP40 protein - yeast (Saccharomyces cerevisiae) , N « 1 , 
Score » 100, P ~ 0.08 

TREMBL:AC004990_1 gene: "WUGSC:H_DJ1185I07 .2"; Homo sapiens PAC clone 

DJ1185I07 from 7qll.23-q21, complete sequence., N ■* 2, Score = 2693, P 
- 0 



>TREMBL:AC004990_1 gene: rt WUGSC:H_DJ1185I07.2"; Homo sapiens PAC clone 
DJ1185I07 from 7qll.23-q21, complete sequence. 
Length « 588 

HSPs: 

Score *= 2693 (404.1 bits), Expect = O.Oe+00, Sum P(2) = 0.0e+00 
Identities = 510/515 (99%), Positives =» 512/515 (99%) 



GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 



TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 



KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 



AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 



GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 



EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 



PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMIMNRVNS 



HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 



Query: 


35 


Sbjct: 


1 


Query 


95 


Sbjct: 


61 


Query: 


155 


Sbjct: 


121 


Query: 


215 


Sbjct: 


181 


Query: 


275 


Sbjct: 


241 


Query: 


335 


Sbjct: 


301 


Query: 


395 


Sbjct: 


361 


Query: 


455 


Sbjct: 


421 


Query: 


515 


Sbjct: 


481 


Score 


« 409 


Identities : 


Query: 


595 


Sbjct: 


474 


Query: 


641 


Sbjct: 


534 



VI I S FVVRVSLVWI FFFLLCVAERT YKQ L+ K+ 



O.Oe+00, Sum P(2) 
*es - 98/115 (85%) 



0.0e+00 



DVIV S 



+F++ +S+V+I 



C A 



-QINLYLKMEKKPNKKEELTLVNNVLK 640 
QINLYLKMEKKPNKKEELTLVNNVLK 



LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 



Pedant information for DKFZphtes3_6dl6, frame 2 
Report for DKFZphtes3_6dl6.2 



[LENGTH] 695 

[MW] 78466.68 

tplj 9.30 

[HOMOL] TREMBL:AC004990_1 gene: "WUGSC :H_DJ1185I07 . 2"; Homo sapiens PAC clone DJ1185I07 

from 7qll.23-q21, complete sequence. 0.0 
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[PROSITE] CYTOCHROME_C 1 

[KW] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 5.32 % 

SEQ MASKVTDAIVWYQKKIGAYDQQIWEKSVEQREIKGLRNKPKKTAHVKPDLIDVDLVRGSA 

SEG 

PRD ccceeeeehhhhhhhcccchhhhhhhhhhhhhhhcccccccccccccccceeeeeeccch 

MEM 

SEQ FAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQVTSKVIFFWLLVLYLLQVAAIVLFCST 

SEG xxxxxxxxxxx 

PRD hhhhcccccccccccccceeeeecchhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTPKPPLSTGGKRRRKLRKAAHLEVHREG 

SEG xxxxxxxx 

PRD ccccccceeeeehhhhhhhhhhhhheeeeeeccccccccccchhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ DGS STT DN T QEG A VQN HGTS T S H S VGT VFRDLW HAA F FLS G S KKAKN SIDKSTETDNGYV 

SEG 

PRD cccccccccceeeeeeccccccccchhhhhhhhhhhhhhcccchhhhhcccccccccccc 

MEM 

SEQ SLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNTGTLRNGPSKDTQRTITNVSDEVSSEE 

SEG 

PRD cccccceeecccccccccccccccccccceeeeccccccccccccceeeecccccccccc 

MEM 

SEQ GPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPNEDAPKSGTSCSSRCSSSRQDSESARP 

SEG xxxxxxxxxxxxxxxxxx . . . 

PRD ccccceeeeeeccccccchhhhhhcccccccccccccccccccccccccccccccccccc 

MEM 

SEQ ESETEDVLWEDLLHCAECHSSCTSETDVENHQINPCVKKEYRDDPFHQSHLPWLHSSHPG 

SEG 

PRD cccchhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccccccccccc 

MEM 

SEQ LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQIFGNAVSLILGLTPFVFR 

SEG 

PRD cccceeeeeecccccccceeeeehhhhhhhhhccccccccccccccccceeecccccchh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LSQATDLEQLTAHSASELYVIAFGSNEDVIVLSMVIISFVVRVSLVWIFFFLLCVAERTY 

SEG 

PRD hhhhhhhhhhhhcccceeeeeeeccccceeeehhhhhhhhcchhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ KQRLLFAKLFGHLTSARRARKSEVPHFRLKKVQNIKMWLSLRSYLKRRGPQRSVDVIVSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccceeeeeehhhhhhhhhhhhccccceeeeeeee 

MEM MMMMMMM 

SEQ AFLLTISWFICCAQINLYLKMEKKPNKKEELTLVNNVLKLATKLLKELDSPFRLYGLTM 

SEG 

PRD eeeeeeeeeeeeeehhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ NPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 

SEG 

PRD cchhhhheeeeeeeeecchhhhhccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM >•••••*•■• 



Prosite for DKFZphtes3_6dl6 . 2 
PS00190 375->381 CYTOCHROME_C PDOC00169 

{No Pfam data available for DKFZphtes3_6dl6. 2) 
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DKFZphtes3_72kll 



group: testes derived 

DKFZphtes3_72kll encodes a novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine zippers and a microbodies C-terrainal targeting signal (S- 
K-L) signature. This sequence is responsible for transport of proteins from free polysomes 
into the microbodies. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to S.pombe hypothetical repeat-containing protein 

complete cDNA, complete cds, 6 EST hits (3 from testis derived 
librarys) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1134 bp 

Poly A stretch at pos. 1124, polyadenylation signal at pos. 1068 



1 


AACCTTTCAA 


51 


TTCTTGGCCA 


101 


TCACTGCCAG 


151 


CTGACATGAC 


201 


CAGTGGGAAG 


251 


GGGTCCTGTC 


301 


ATGTTTTCCT 


351 


CCTGGCGGCA 


401 


TGCAGGAGGA 


451 


ATAGAGGACT 


501 


TTTCCGGGGC 


551 


AAGAGGAGAA 


601 


AAGTCTTTCA 


651 


CTGGAAGGAG 


701 


GAGACCGGAA 


751 


GCCCTGTGGG 


801 


GGAAGATAAA 


851 


AGAGGGCCTT 


901 


CTCGAAGATG 


951 


CTCCCGAGGC 


1001 


GACTCCCCTG 


1051 


AAATACACAC 


1101 


AGGCAAGGTT 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES_CTER (231-234) 
LEUCINE_ZIPPER (142-164) 
LEUCINE_ZIPPER (149-171) 
LEUCINE_ZIPPER (156-178) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (170-192) 
LEUCINE ZIPPER (170-192) 
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1 MATPPFRLIR KMFSFKVSRW MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 

51 FREEMKIFRE KIEDFREEMW TFRGKIHAFR GQILGFWEEE RPFWEEEKTF 

101 WKEEKSFWEM EKSFREEEKT FWKKYRTFWK EDKAFWKEDN ALWERDRNLL 

151 QEDKALWEEE KALWVEERAL LEGEKALWED KTSLWEEENA LWEEERAFWM 
201 ENNGHVAGEQ MLEDGPHNAN RGQRLLAFSR GRA 

BLAST P hits 

Entry SPCC330_4 from database TREMBLNEW: 

gene: "SPCC330 . 04c"; product: "hypothetical repeat-containing protein" 
S.pombe chromosome III cosmid c330. 

Score = 149, P = 1.6e-08, identities - 55/187, positives = 88/187 

Entry A45973 from database PIR: 
trichohyalin - human 

Score - 147, p » 3.0e-07, identities » 57/194, positives = 94/194 



Alert BLAST P hits for DKFZphtes3_72kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_72kll, frame 1 



Report for DKFZphtes3_72kll . 1 



[LENGTH) 233 

(MW] 28752.65 

[pi] 5.70 

[PROSITE] LEUCINE_ZIPPER 5 

[ PROS I TE ] M I CROBOD I E S_CTER 1 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PH0SPHO_SITE 4 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 15.45 % 



SEQ MAT P P FRL I RKM FS F K V S RWMGLAC FRS L AA S S PS I RQ K KLMHKLQE EKA FRE EMK I FRE 

SEG 

PRD cccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIEDFREEMWTFRGKIHAFRGQILGFWEEERPFWEEEKTFWKEEKSFWEMEKSFREEEKT 

SEG xxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ FWKKYRTFWKEDKAFWKEDNALWERDRNLLQEDKALWEEEKALWVEERALLEGEKALWED 

SEG 

PRD hhhhcccccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTSLWEEENALWEEERAFWMENNGHVAGEQMLEDGPHNANRGQRLLAFSRGRA 

SEG . . .xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhccccchhhhhhcccccccccchhhhhhhhccc 



Prosite for DKFZphtes3_72kll . 1 



PS00005 


14->17 


PS00005 


35->38 


PS00005 


71->74 


PS00005 


113->116 


PS00006 


106->110 


PS00006 


113->117 


PS00006 


183->187 


PS00008 


81->87 


PS00342 


231->234 


PS00029 


142->164 


PS00029 


149->171 


PS00029 


156->178 


PS00029 


163->185 


PS00029 


170->192 



PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PH0SPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

MYRISTYL 

MICROBODI ES_CTER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00299 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 



(No Pfam data available, for DKFZphtes3_72kll . 1 ) 
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DKFZphtes3_72kl5 



group: cell structure and motility 

DKFZphtes3_72kl5 encodes a novel 188 amino acid protein with strong similarity to Rattus 
norvegicus actin-f ilament binding protein Frabin. 

FGDl-related F-actin-binding protein (Farbin/FGDl) is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible for faciogenital dysplasia or Aarskog-Scott syndrome. 
Frabin binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in 
Swiss 3T3 cells and C0S7 cells induces cell shape change and c-Jun N-terminal kinase 
activation, as described for FGDl. Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actin cytoskeleton . Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 



strong similarity to actin-filament binding protein Frabin 

2 EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1845 bp 

Poly A stretch at pos. 1835, polyadenylation signal at pos. 1816 



1 GTGATGGAGA GTGCTGTTAT GATAGATGAA TCTAGGAAAG CCTCTTTGGA 

, 51 GATGTGATAC CTGAACAGAA CCCCGAATGA TAAGAAGAAA TACCAGTGTT 

10 1 TTAGGAGAGA TTGTCCTAAG CAGAGAACAG CAGCTGCAAA GACCCCAAGA 

151 C AC AT AC ACT TGGTTATTAA GAATGGGAGC AGCAAGGAGT ATGGCAAGAA 

201 , CACAGTGAGT TTTCCCTTGA GTGTGTGAGG AAGCCCTCAG AGTTTGTGAC 

251 TGACTTGTAG AGGTTCTAGT GGAGGGGATC AGAGTGGAAA CAAAGAGACC 

301 AGTTAAAAAG GTATGGCAGC ATGAATAAAA AAGTTTTGAG AGTATTCATT 

351 ATGCCTTCCA AATAAAAAAC TCTTTGGTTC ATAATTTGTT CATAAATTAA 

401 GGACTGGCTA CACTGTACTA TTTAAAAATG TTAAGAAACA TCAATAAGTA 

451 AAAATGTTAG GAAGAGATGA TAAATACGTA AGTATTATAT CTAACTAAGT 

501 CTTTACTAAC TAGTCACATT ATTAAACAGT GCAAGGATCA AGAAAAGTTA 

551 AGCGTTGAAA AATAAATAAA TAAGTTATAA ATAAAATAAA CAGCCCAAGG 

601 AAATGTTCCA GTCCCCATAG GTAGACTCGG GGTCATCTTC TTTATTTAAA 
651* TCTTTATTTA AATGTGGATA GCATCCCAAG AGACTTGGGT CTACACTAAG 

701 AATATTCAAA TCCATGTTTC TGAAACCATC AGAGATAGAA AAAAAAAGTA 

751 GCGAATATCC CTTTTCAACT GGAATAAACT TGTCTTAATT CTAGAACTTT 

801 TCCATACCAA TGTTTTCATG CTTCCTTTGT ATTTTATCTT TTAGCTCATT 

851 ATCAAATTAT AGTGATTTGA AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 

901 CTAGAACCCC AGGAAGGCAT GGATTGACAA CCACACCTCA ACAAAAACTC 

951 CTCTCCCAGC ACTTGCCACA GAGGCAGGGA AATGATACAG ATAAGACTCA 

1001 GGGTGCACAG ACTTGTGTGG CCAACGGTGT AATGGCAGCA CAAAACCAGA 

1051 TGGAATGTGA GGAGGAGAAA GCTGCCACTC TTAGCTCAGA TACTTCTATT 

1101 CAAGCTTCTG AACCCTTGCT TGATACGCAC ATAGTGAATG GAGAAAGAGA 

1151 TGAAACTGCC ACAGCTCCTG CATCACCCAC AACAGATAGC TGTGATGGAA 

1201 ATGCTTCTGA CAGTAGCTAC AGGACTCCAG GCATAGGCCC AGTGCTCCCC 

1251 CTAGAAGAAA GAGGGGCAGA AACAGAAACC AAGGTACAAG AGAGGGAAAA 

1301 TGGGGAAAGC CCTCTGGAAC TGGAGCAGCT GGACCAGCAC CATGAGATGA 

1351 AGGTAGAGCA TGAGACTAGC TCATGAGCAG GGAAAACCCT GCCTATTCGA 

1401 TTGTTGTCTT AAAACTCTTT ATTTATTGCA CCCCTGAAAT GTATGAATCA 

1451 GATCACCCAC ACTGGCAGTT AAACGATTTT CAAGCTCTGG CTGCTGATTA 

1501 GCATTTCCCC TATGCTCTAA GCAGATATTT CACTTTTTCT TTTCATGTAG 

1551 TTTCTGTTAA TATCTCTGTT GTAATTTCAG GAGTCAGAAC AGTGTGGAAA 

1601 CTTTAATATA GGAAATCCAC AAATGTATTG TTTTTACATA GAAAGAAAAT 

1651 GTTCCTTGTT GCTCTAGATG TTGGTGCTGT ATCCCTAATA CTTACGGGCC 

1701 AAGCAAGAAG AAATTGTATA ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 

1751 TAAATAGGCC TGTAAGATGA ACTTGCCACT AGTAAATGTT ACTTTTAAGG 

1801 ACATGAATAT GGAAGTATTA AATTATTCAA CAGATAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98334590: 

Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing cell shape 

and activating c-Jun N-terminal kinase. 



Peptide information for frame 3 



ORF from 810 bp to 1373 bp; peptide length: 188 
Category: similarity to known protein 
Classification: Cell structure/motility 

1 MFSCFLCILS FSSLSNYSDL KKESAVNLNA PRTPGRHGLT TTPQQKLLSQ 
51 HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT LSSDTSIQAS 
101 EPLLDTHIVN GERDETATAP ASPTTDSCDG NASDSSYRTP GIGPVLPLEE 
151 RGAETETKVQ ERENGESPLE LEQLDQHHEM KVEHETSS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72kl5, frame 3 

TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; 
Rattus norvegicus actin-f ilament binding protein Frabin mRNA, complete 
cds., N = 1, Score - 428, P = 1.8e-39 



>TREMBL:AF038388_1 product: "actin- filament binding protein Frabin"; Rattus 
norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 
Length « 7 66 

HSPs: 

Score » 428 (64.2 bits), Expect <= 1.8e-39, P = 1.8e-39 
Identities = 90/174 (51%), Positives = 115/174 (66%) 

Query: 12 SSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDTDKTQGAQTCVA 71 

S LS+Y+D++K+S +NLN P+TP +HGLT+T QKL S PQ+Q D+D+ QG C+A 
Sbjct: 31 SVLSSYTDVQKDSTMNLNIPQTPRQHGLTSTTPQKLPSHKSPQKQEKDSDQNQGQHGCLA 90 

Query: 72 NGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAPASPTTDSCDGN 131 

NGV AAQ+QMECE EK A LS +T Q + D H++NG R+ET T AS T+S D N 
Sbjct: 91 NGVAAAQSQMECETEKEAALSPETDTQTAAASPDAHVLNGVRNETTTDSASSVTNSHDEN 150 

Query: 132 ASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEMKVEHE 185 

A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

Sbjct: 151 ACDSSCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 



Pedant information for DKFZphtes3_72kl5, frame 3 



Report for DKFZphtes3_72kl5 .3 



[LENGTH] 188 

[MW] 20388.32 

[pi] 4.62 

[HOMOL] TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; Rattus 

norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 2e-38 

[KWJ All_Alpha 

[KW) SIGNAL_PEPTIDE 16 

[KWJ LOW COMPLEXITY 12.77 % 



SEQ MFSCFLCILSFSSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

SEG .xxxxxxxxxxxxxx 

PRD ccchhhhhcccccccccccccccccccccccccccccccccccchhhhhhhccccccccc 

SEQ DKTQGAQTCVANGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAP 
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SEG xxxxx 

PRD ccccccceeecchhhhhhhhhhhhhhhhhhhccccceeecccccceeeeecccccccccc 

SEQ ASPTTDSCDGNASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEQ KVEHETSS 

SEG 

PRD hhhhhccc 

{No Prosite data available for DKFZphtes3_72kl5 . 3) 
(No Pfam data available for DKFZphtes3_72kl5. 3) 



926 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3J72pl6 



group: intracellular transport and trafficing 

DKFZphtes3_72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mus 
musculus maternal-embryonic 3 (Mem3) gene. 

Mem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
preimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Mem3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar carboxypeptidase Y (CPY) , proteinase A (PrA), 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments . 

strong similarity to mouse MEM 3 and yeast VPS 3 5 
Sequenced by DKFZ 
Locus: /map= M 16pl3 . 3" 
Insert length: 2707 bp 

Poly A stretch at pos. 2697, no polyadenylation signal found 



1 CTACGCGCGG GGCGGGTGCT GCTTGCTGCA GGCTCTGGGG AGTCGCCATG 
51 CCTACAACAC AGCAGTCCCC TCAGGATGAG CAGGAAAAGC TCTTGGATGA 
101 AGCCATACAG GCTGTGAAGG TCCAGTCATT CCAAATGAAG AGATGCCTGG 
151 ACAAAAACAA GCTTATGGAT TCTCTAAAAC ATGCTTCTAA TATGCTTGGT 
201 GAACTCCGGA CTTCTATGTT ATCACCAAAG AGTTACTATG AACTTTATAT 
251 GGCCATTTCT GATGAACTGC ACTACTTGGA GGTCTACCTG ACAGATGAGT 
301 TTGCTAAAGG AAGGAAAGTG GCAGATCTCT ACGAACTTGT ACAGTATGCT 
351 GGAAACATTA TCCCAAGGCT TTACCTTTTG ATCACAGTTG GAGTTGTATA 
401 TGTCAAGTCA TTTCCTCAGT CCAGGAAGGA TATTTTGAAA GATTTGGTAG 
451 AAATGTGCCG TGGTGTGCAA CATCCCTTGA GGGGTCTGTT TCTTCGAAAT 
501 TACCTTCTTC AGTGTACCAG AAATATCTTA CCTGATGAAG GAGAGCCAAC 
551 AGATGAAGAA ACAACTGGTG ACATCAGTGA TTCCATGGAT TTTGTACTGC 
601 TCAACTTTGC AGAAATGAAC AAGCTCTGGG TGCGAATGCA GCATCAGGGA 
651 CATAGCCGAG ATAGAGAAAA AAGAGAACGA GAAAGACAAG AACTGAGAAT 
701 TTTAGTGGGA ACAAATTTGG TGCGCCTCAG TCAGTTGGAA GGTGTAAATG 
751 TGGAACGTTA CAAACAGATT GTTTTGACTG GCATATTGGA GCAAGTTGTA 
801 AACTGTAGGG ATGCTTTGGC TCAAGAATAT CTCATGGAGT GTATTATTCA 
851 GGTTTTCCCT GATGAATTTC ACCTCCAGAC TTTGAATCCT TTTCTTCGGG 
901 CCTGTGCTGA GTTACACCAG AATGTAAATG TGAAGAACAT AATCATTGCT 
951 TTAATTGATA GATTAGCTTT ATTTGCTCAC CGTGAAGATG GACCTGGAAT 
1001 CCCAGCGGAT ATTAAACTTT TTGATATATT TTCACAGCAG GTGGCTACAG 
1051 TGATACAGTC TAGACAAGAC ATGCCTTCAG AGGATGTTGT ATCTTTACAA 
1101 GTCTCTCTGA TTAATCTTGC CATGAAATGT TACCCTGATC GTGTGGACTA 
1151 TGTTGATAAA GTTCTAGAAA CAACAGTGGA GATATTCAAT AAGCTCAACC 
1201 TTGAACATAT TGCTACCAGT AGTGCAGTTT CAAAGGAACT CACCAGACTT 
1251 TTGAAAATAC CAGTTGACAC TTACAACAAT ATTTTAACAG TCTTGAAATT 
1301 AAAACATTTT CACCCACTCT TTGAGTACTT TGACTACGAG TCCAGAAAGA 
1351 GCATGAGTTG TTATGTGCTT AGTAATGTTC TGGATTATAA CACAGAAATT 
1401 GTCTCTCAAG ACCAGGTGGA TTCCATAATG AATTTGGTAT CCACGTTGAT 
1451 TCAAGATCAG CCAGATCAAC CTGTAGAAGA CCCTGATCCA GAAGATTTTG 
1501 CTGATGAGCA GAGCCTTGTG GGCCGCTTCA TTCATCTGCT GCGCTCTGAG 
1551 GACCCTGACC AGCAGTACTT GATTTTGAAC ACAGCACGAA AACATTTTGG 
1601 AGCTGGTGGA AATCAGCGGA TTCGCTTCAC ACTGCCACCT TTGGTATTTG 
1651 CAGCTTACCA GCTGGCTTTT CGATATAAAG AGAATTCTAA AGTGGATGAC 
1701 AAATGGGAAA AGAAATGCCA GAAGATTTTT TCATTTGCCC ACCAGACTAT 
1751 CAGTGCTTTG ATCAAAGCAG AGCTGGCAGA ATTGCCCTTA AGACTTTTTC 
1801 TTCAAGGAGC ACTAGCTGCT GGGGAAATTG GTTTTGAAAA TCATGAGACA 
1851 GTCGCATATG AATTCATGTC CCAGGCATTT TCTCTGTATG AAGATGAAAT 
1901 CAGCGATTCC AAAGCACAGC TAGCTGCCAT CACCTTGATC ATTGGCACTT 
1951 TTGAAAGGAT GAAGTGCTTC AGTGAAGAGA ATCATGAACC TCTGAGGACT 
2001 CAGTGTGCCC TTGCTGCATC CAAACTTCTA AAGAAACCTG ATCAGGGCCG 
2051 AGCTGTGAGC ACCTGTGCAC ATCTCTTCTG GTCTGGCAGA AACACGGACA 
2101 AAAATGGGGA GGAGCTTCAC GGAGGCAAGA GGGTAATGGA GTGCCTAAAA 
2151 AAAGCTCTAA AAATAGCAAA TCAGTGCATG GACCCCTCTC TACAAGTGCA 
2201 GCTTTTTATA GAAATTCTGA ACAGATATAT CTATTTTTAT GAAAAGGAAA 
2251 ATGATGCGGT AACAATTCAG GTTTTAAACC AGCTTATCCA AAAGATTCGA 
2301 GAAGACCTCC CGAATCTTGA ATCCAGTGAA GAAACAGAGC AGATTAACAA 
2351 ACATTTTCAT AACACACTGG AGCATTTGCG CTTGCGGCGG GAATCACCAG 
2401 AATCCGAGGG GCCAATTTAT GAAGGTCTCA TCCTTTAAAA AGGAAATAGC 
2451 TCACCATACT CCTTTCCATG TACATCCAGT GAGGGTTTTA TTACGCTAGG 
2501 TTTCCCTTCC ATAGATTGTG CCTTTCAGAA ATGCTGAGGT AGGTTTCCCA 
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2551 TTTCTTACCT GTGATGTGTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 
2601 GGACCTTAAT AAAATTATTC ACTTGGTAAG TGTTCAAGTC TTTCTGATCA 
2651 CCCCAAGTAG CATGACTGAT CTGCAATTTA AAATTCCTGT GATCTGTAAA 
2701 AAAAAAA 



BLAST Results 



Entry AC007225 from database EMBLNEW: 

Homo sapiens chromosome 16 clone 480G7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 
Score = 1081, P - 2.8e-217, identities = 219/221 
13 exons 

Entry HS015146 from database EMBL: 
human STS WI-8848. 
Score = 2033, P = 2.9e-87, identities 425/436 



Medline entries 



96327632: 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mem3. 

97258867: 

Endosome to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, requires the function of the 
VPS29, VPS30, and VPS35 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant missorts and 
secretes only a subset of vacuolar hydrolases. 

10198044: 

Distinct Domains within Vps35p Mediate the Retrieval of Two Different 
Cargo Proteins from the Yeast 

Prevacuolar/Endosomal Compartment 



Peptide information for frame 3 



ORF from 48 bp to 24 35 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 



1 MPTTQQSPQD EQEKLLDEAI QAVKVQSFQM KRCLDKNKLM DSLKHASNML 
51 GELRTSMLSP KSYYELYMAI SDELHYLEVY LTDEFAKGRK VADLYELVQY 
101 AGNI I PRLYL LITVGWYVK SFPQSRKDIL KDLVEMCRGV QHPLRGLFLR 
151 NYLLQCTRNI LPDEGEPTDE ETTGDISDSM DFVLLNFAEM NKLWVRMQHQ 
201 GHSRDREKRE RERQELRILV GTNLVRLSQL EGVNVERYKQ IVLTGILEQV 
251 VNCRDALAQE YLMECIIQVF PDEFHLQTLN PFLRACAELH QNVNVKNIII 
301 ALIDRLALFA HREDGPGIPA DIKLFDIFSQ QVATVIQSRQ DMPSEDVVSL 
351 QVSLINLAMK CYPDRVDYVD KVLETTVEIF NKLNLEHIAT SSAVSKELTR 
401 LLKIPVDTYN NILTVLKLKH FHPLFEYFDY ESRKSMSCYV LSNVLDYNTE 
451 IVSQDQVDSI MNLVSTLIQD QPDQPVEDPD PEDFADEQSL VGRFIHLLRS 
501 EDPDQQYLIL NTARKHFGAG GNQRIRFTLP PLVFAAYQLA FRYKENSKVD 
551 DKWEKKCQKI FSFAHQTISA LIKAELAELP LRLFLQGALA AGEIGFENHE 
601 TVAYEFMSQA FSLYEDEISD SKAQLAAITL IIGTFERMKC FSEENHEPLR 
651 TQCALAASKL LKKPDQGRAV STCAHLFWSG RNTDKNGEEL HGGKRVMECL 
701 KKALKIANQC MDPSLQVQLF IEILNRYIYF YEKENDAVTI QVLNQLIQKI 
751 REDLPNLESS EETEQINKHF HNTLEHLRLR RESPESEGPI YEGLIL 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72pl6, frame 3 

TREMBL:AF024504_3 gene: "A_TM017A05.7"; Arabidopsis thaliana BAC 
TM017A05., N - 2, Score - 927, P - 1.9e-162 
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PIR:S56936 vacuolar protein-sorting protein VPS35 - yeast 
(Saccharomyces cerevisiae), N = 3, Score = 826, P = 1.5e-116 

TREMBL:MM47024_1 gene: "Mem3"; product: "MEM 3 " ; Mus musculus 

maternal -embryonic 3 (Mem3) mRNA, complete cds., N = 1, Score « 3376, P 

- 0 

TREMBL:S42186_1 gene: "VPS35" ; product: "VpsSSp"; VPS35«vacuolar 
protein sorting {Saccharomyces cerevisiae=yeast, Genomic, 3790 nt] , *N = 
3, Score - 813, P = 4.4e-115 



>TREMBL:MM47024_1 gene: "Mem3"; product: "MEM3"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds. 
Length = 754 

HSPs: 

Score = 3376 (506.5 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities - 666/721 (92%), Positives » 682/721 (94%) 



+VYLTDEFAKG ++ADLYELVQY+GNIIPRLYLLITVGWYVKSFPQSRKDILKDLVEMC 



RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 



Query: 


78 


Sbjct: 


34 


Query: 


138 


Sbjct: 


94 


Query: 


198 


Sbjct: 


154 


Query: 


257 


Sbjct: 


214 


Query: 


317 


Sbjct: 


274 


Query: 


377 


Sbjct: 


334 


Query: 


435 


Sbjct: 


394 


Query: 


495 


Sbjct: 


454 


Query: 


555 


Sbjct: 


514 


Query: 


615 


Sbjct: 


574 


Query: 


675 


Sbjct: 


634 


Query: 


735 


Sbjct: 


693 


Query: 


795 


Sbjct: 


753 



QHQGHSRDREKRERERQELRI LVGTNLV L+ + 



+QI VLTG I LEQV VNCRDA 



LAQE MECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIIIALIDRLALFAHRE P 



GIPA++KLFDIFSQQVATVIQSR+DMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 



VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYES 



SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 



IHLLRS+DPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSK 



++ F 



HQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 



EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRT+CALAASKLLKKPDQ 



L WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFI EILNRYIYFYEKE 



NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLR RRESPESEGPIYEGL 



IL 



Pedant information for DKFZphtes3_72pl6, frame 3 
Report for DKFZphtes3_72pl6 . 3 

[LENGTH) 796 
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f MW1 
inn j 


91723 . 67 


f Dl 1 

L P A J 


5 . 32 


[HOMOL] 


TREMBL : MM47024 1 gene: "Mem3"; product: "MEM3"; Mus inusculus maternal -embryonic 


3 (Mem 3) mRNA, 


complete eels . 0.0 


[ FUNC AT ] 


30.25 vacuolar and lysosomal organization [S. cerevisiae, YJL154c] le-110 


[ FUNCAT J 


08.13 vacuolar transport [S. cerevisiae, YJL154c] le-110 


f FUNCAT 1 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YJL154c] 


le-110 




[FUNCAT] 


30.22 endosomal organization [S. cerevisiae, YJLl54c] le-110 


( FUNCAT] 


08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YJLl54c] 


le-110 




[FUNCAT] 


30.08 organization of golgi [S. cerevisiae, YJL154c] le-110 


[FUNCAT] 


09.07 biogenesis of endoplasmatic reticulum [S. cerevisiae, YJLl54c] le-110 


[BLOCKS] 


BL01092Q 


[PIRKW] 


yeast vacuole le-108 


[PIRKW] 


membrane protein le-108 


[KW] 


TRANSMEMBRANE 1 


[KW] 


LOW COMPLEXITY 5.40 % 



SEQ MPTTQQSPQDEQEKLLDEAIQAVKVQSFQMKRCLDKNKLMDSLKHASNMLGELRTSMLSP 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ KS YYELYMAI S D ELH YLE V YLT DE FAKG RK V ADLY E L VQY AGN I I PRLYLLI TVGVVYVK 

SEG 

PRD cceeeeehhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceeeeeceeeee 

MEM MMMMMMMMMMMMMM 

SEQ SFPQSRKDILKDLVEMCRGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSM 

S EG xxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccccccccccccccch 

MEM MMMMMMMMMM '. 

SEQ DFVLLNFAEMNKLWVRMQHQGHSRDREKRERERQELRILVGTNLVRLSQLEGVNVERYKQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhccchhhhhh 

MEM 



SEQ IVLTGILEQVVNCRDALAQEYLMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

MEM 

SEQ ALIDRLALFAHREDGPGIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMK 

SEG 

PRD hhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhh 

MEM 

SEQ CYPDRVDYVDKVLETTVEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKH 

SEG 

PRD cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

MEM 

SEQ FHPLFEYFDYESRKSMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPD 

SEG xxxxxxxxxxxx 

PRD hhhheeecccchhhhhhhhhhhhccccceeehhhhhhhhhhhhhhhhhhccccccccccc 

MEM 

SEQ PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLA 

SEG xxx 

PRD ccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhcccccceeeeeccchhhhhhhhh 

MEM 

SEQ FRYKENSKVDDKWEKKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHE 

SEG 

PRD hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM 

SEQ TVAYEFMSQAFSLYEDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKL 

SEG 

PRD eeeeehhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

MEM 

SEQ LKKPDQGRAVSTCAHLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLF 

SEG 

PRD hhcccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 

MEM 

SEQ IEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLR 
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SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RESPESEGPIYEGLIL 

SEG 

PRD hhcccccccceeeccc 

MEM 



(No Prosite data available for DKFZphtes3_72pl6.3) 
(No Pfam data available for DKFZphtes3_72pl6. 3) 
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DKFZphtes3_7b22 



group: cell structure and motility 

DKFZphtes3_7b22 encodes a novel 443 amino acid protein with weak similarity to paramyosins. 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate rauscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 



similarity to paramyosins 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : /map= " 3 " 

Insert length: 2291 bp 

Poly A stretch at pos. 2241, polyadenylation signal at pos. 2213 



1 GGAAGAAAGG CTAGCGGGCG 

51 TTTCAGTTCT TTCATTTACC 

101 GTTTAGACGT ACATACAACC 

151 TATGAAAAGT TTCCAGCCAA 

201 TCCTGGCCAT TAAGGAAACC 

251 TACAAAATGG GAAATTGGGA 

301 TAAAATTACG AACTCACTGA 

351 GTAACTGCAC TTTTGTGAGA 

401 TACAGAAGAA TGGAAGAAGA 

451 AGTTTGGCAT TCTGAGATGA 

501 CCGTAGAAGA AGAAGGAATA 

551 GAAATCCCGG AAACTCTAGA 

601 CTCGGCAGTT CTGGAGGACA 

651 TCATGCCCGT TCAGTACGAA 

701 GAAATGAATC TAGAAGGAAC 

751 AATCACAAAA ATACCCAGTC 

801 CAGAAATCAG ACACAGAGGC 

851 GATCTTGTCT TCAAAAAACC 

901 ACTGAAGAAA ATTCAGATTG 

951 ATACCATTAA GGAGTTGCAA 

1001 GCTTTGAGCA AAGAGAGGGA 

1051 CAGGGAGGAA AAAGGAAGAA 

1101 TTAATGTCAA AAAGGAATGG 

1151 ATTGCTAACC TCAAGGACCA 

1201 GGAGAATCGC TACATGAAAA 

1251 AGAAAAAGTG TAACAGAACA 

1301 CTCAGGATGA AAACCGAAGA 

1351 GTTCCTTAGA AAGGAGCAGC 

1401 TGGAGAAATA CGATAAGGAC 

1451 CTCAAAGCCA CAAAGGCCAG 

1501 GATGATAAGA GAGTATGAAC 

1551 AGAGGAGCAA GAAGAAGGTA 

1601 ATAAAGCTCC AGGCCTGGTG 

1651 TGGTTTCAAG ATGCCTAAAG 

1701 AAGGTAAAGG CAAGGATAAG 

1751 CTTTTGTGTT TTCTGCTGGT 

1801 AAGAAACACC TGGTACCTCA 

1851 TGAGACTTTC CCAGGGAAGC 

1901 TGCCTGTTAG GTGGGTTTTC 

1951 TAGGGCTTCC TCATACCTTG 

2001 GATTCATTCT TCTTGCTCTT 

2051 AGCTACAGTG ACGCATTGAA 

2101 GATTTTACCC AATTTGTCTG 

2151 TCAGTAGGAA TTACAATATG 

2201 ACAGCAAAGT TTTAATAAAT 

2251 AAAAAAAAAA AAAAAAAAAA 



TTGGCCGTAT GTGGGTGTCT TGAGGCAGTT 
AAAGTGACAT GCACCTACTA GGTGCCAGGT 
CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 
GAATTGCCAC TGCACCTGAG ATAAGGGGGA 
TTGCCTTCGA AACTGAGCCG TGAGGAACTA 
CAAATCCCAG TGGCTCATGA CACTAAGAAG 
GCTGGAAGTC ATTCAACGGG AATTGAATAG 
TTATAAATAT ACCACGGAGG GTAACGAAGC 
CAGCCTGGAA GACTCAAACC TTCCTCCAAA 
CGGTGTCAGT GACAGGCGAA CCACCTAGTA 
CCTAAAGAAA CAGACATAGA AATCATCCCA 
GCCACTGTCC CTTCCAGATG TGCTGAGGAT 
CCACAGACCA GCTCTCTATT CTGAACTACA 
GGGAGACAGA GCATCTGCGT GAAAAGCAGA 
GAATCTAGAC AAACTTCCAA TGGCCTCAAC 
CGTTAATAAC TGAGGAAGGA CCCAACTTGC 
CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 
TACAAGGCAG ACCATCATGA CTACGGAGAC 
ATAGGCAGTT TTTCAGCGAT GTGATTGCAG 
GATTCGGCCA CTTACAACAG TCTCCTGCAA 
AAACAAAATG CATTTCTATG ACATCATTGC 
AACAGATAAT ATCACTTCAA AAACAGCTAA 
CAATTTGAAG TCCAGAGTCA GAATGAGTAT 
ACTGCAAGAG ATGAAGGCAA AATCCAACTT 
CCAATACCGA GCTGCAGATT GCCCAGACCC 
GAGGAACTCT TGGTGGAAGA GATTGAGAAA 
AGAGGCCCGG ACTCATACAG AGATTGAAAT 
AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 
ACAGAAATGA AACAGAATGA ACTAAATGCT 
TGACTTAGCA CACCTTCAAG ACCTGGCAAA 
AGGTCATCAT TGAAGATCGT ATAGAAAAGG 
AAACAGGATC TCTTGGAATT AAAGAGCGTT 
GCGAGGCACT ATGATACGGA GAGAAATTGG 
ACAAAGTTGA TAGCAAGGAT TCAAAAGGCA 
AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 
ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 
AAGATGACTC ATCTACAGGT TGTTTCCTAT 
CTGATTTCAC TTTGCCTGTT AATTTCACTC 
AAACCCTGAT TTAGGATTAC ACCATTGACT 
CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 
TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 
ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 
TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 
ATGTTATTAG CTGTCCAGCA TAATATATAC 
GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry G36731 from database EMBL: 
SHGC-52923 Human Homo sapiens STS cDNA. 
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Score =» 2262, P = 1.3e-97, identities = 462/468 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 410 bp to 1738 bp; peptide length: 4 43 
Category: similarity to known protein 



1 MEEDSLEDSN LPPKVWHSEM TVSVTGEPPS TVEEEGIPKE TDIEIIPEIP 

51 ETLEPLSLPD VLRISAVLED TTDQLSILNY IMPVQYEGRQ SICVKSREMN 

101 LEGTNLDKLP MASTITKIPS PLITEEGPNL PEIRHRGRFA VEFNKMQDLV 

151 FKKPTRQTIM TTETLKKIQI DRQFFSDVIA DTIKELQDSA TYNSLLQALS 

201 KERENKMHFY DIIAREEKGR KQIISLQKQL INVKKEWQFE VQSQNEYIAN 

251 LKDQLQEMKA KSNLENRYMK TNTELQIAQT QKKCNRTEEL LVEEIEKLRM 

301 KTEEEARTHT EIEMFLRKEQ QKLEERLEFW MEKYDKDTEM KQNELNALKA 

351 TKASDLAHLQ DLAKMIREYE QVIIEDRIEK ERSKKKVKQD LLELKSVIKL 

401 QAWWRGTMIR REIGGFKMPK DKVDSKDSKG KGKGKDKRRG KKK 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_7b22, frame 2 
SWISSPROT:MYSP_BRUMA PARAMYOSIN . , N = 1, Score » 158, P « 5.8e-08 



PIR:A44972 paramyosin - nematode (Dirofilaria immitis) (fragment), N = 
1, Score = 157, P = 7.1e-08 

SWISSPROT:MYSP_ONCVO PARAMYOSIN. , N ° 1, Score =157, P = 7.4e-08 

PIR:S52537 emm L 15 protein - Streptococcus pyogenes, N » 1, Score = 
151, P = 8.6e-08 



>SWISSPROT:MYSP_BRUMA PARAMYOSIN. 
Length ~ 880 

HSPs: 



Score = 158 (23.7 bits), Expect - 5.8e-08, P « 5.8e-08 
Identities = 66/259 (25%), Positives - 125/259 (48%) 



Query: 


142 


EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIADTIKELQDSATYNSLLQALSK 


201 






+ K + L K R T E K++ + +D +A + LQ A N LL+ + 




Sbjct: 


169 


QLKKDKHLAEKAAERFEAQTVELSNKVEDLNRHVND-LAQQRQRLQ — AENNDLLKEIHD 


225 


Query: 


202 


ER ENKMHF-YDIIAREEKGRKQIISLQKQLINVKKBWQFEVQSQNEYIANLKDQLQE 


257 






++ +N H Y + + E+ R+++ +++ ++ + +VQ + + + D+ E 




Sbjct: 


226 


QKVQLDNLQHVKYQLAQQLEEARRRLEDAERERSQLQAQLH-QVQLELDSVRTALDE — E 


282 


Query: 


258 


MKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRMKT-EEEARTHTEIEMFL 


316 






A++ E++ NTE I Q + K + L EE+E LR K +++A +IE+ L 




Sbjct: 


283 


SAARAEAEHKLALANTE— ITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQIEIML 


340 


Query: 


317 


RKEQQ— KLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQDLAKMIREYEQVII 


374 






+K Q K + RL+ +E DEQN+L+K +LK+E+I 




Sbjct: 


341 


QKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAK EQLEKTVNELKVRID 


393 


Query: 


375 


EDRIEKERSKKKVKQDLLELKSVIKL 400 








E +E E ++++ + L EL+ + L 




Sbjct: 


394 


ELTVELEAAQREARAALAELQKLKNL 419 




Score 


= 118 


(17.7 bits), Expect « 1.3e-03, P « 1.3e-03 




Identities = 


= 54/231 (23%), Positives - 108/231 (46%) 




Query: 


181 


DTIKELQDSATYNSLLQ ALSKERENKMHFYDI IAREEKG-RKQIISLQKQLINVKK 


235 






D +KE+ D LQ L+++ E + RE + Q+ +Q +L +V+ 




Sbjct: 


218 


DLLKEIHDQKVQLDNLQHVKYQLAQQLEEARRRLEDAERERSQLQAQLHQVQLELDSVRT 


277 
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Query: 236 EWQFE— VQSQNEY-IANLKDQLQEMKAKSNLENRYMKTNTE-LQIAQTQKKCNRTEELL 291 

E +++ E+ +A ++ + K+K + E E L+ QK+ E++ 

Sbjct: 278 ALDEESAARAEAEHKLALANTEITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQIE 337 

Query: 292 VEEIEKLRMKTEEEARTHTEIEMF LRKEQQKLE— ERLEFWMEKYDKDTEMKQNELN 346 

+ ++K+ + ++R +E+E+ L K Q + ER + +EK + +++ +EL 
Sbjct: 338 IM-LQKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAKEQLEKTVNELKVRIDELT 396 

Query: 347 A-LKATKASDLAHLQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVI 398 

L+A + A L +L K+ YE+ + E + R KK++ DL E K + 
Sbjct: 397 VELEAAQREARAALAELQKLKNLYEKAV-EQKEALARENKKLQDDLHEAKEAL 448 

Score = 107 (16.1 bits), Expect = 2.1e-02, P = 2.1e-02 
Identities = 49/279 (17%), Positives - 124/279 (44%) 

Query: 123 ITEEGPNLPEIRHRGRFAV-EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIAD 181 

IE h + R A+ E K+++L K ++ + E KK+Q D + +AD 

Sbjct: 392 IDELTVELEAAQREARAALAELQKLKNLYEKAVEQKEALAREN-KKLQDDLHEAKEALAD 450 

Query: 182 TIKELQDSATYNSLLQALSKERENKMHFYDIIAREEKGRKQ — IISLQKQLINVKKEWQF 239 

++L + N+L +E+ + + R+ + RQ + LQ+ I +++ Q 
Sbjct: 451 ANRKLHELDLENARLAGEIRELQTALKESEAARRDAENRAQRALAELQQLRIEMERRLQE 510 

Query: 240 EVQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTE-ELLVEEIEKL 298 

+ + N++ +++A L + + E+ +++ EE+V+++ 

Sbjct: 511 KEEEMEALRKNMQFEIDRLTAA— LADAEARMKAEISRLKKKYQAEIAELEMTVDNLNRA 568 

Query: 299 RMKTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAH 358 

++ ++ + +E L+ + + +L+ +++Y + Q +++AL A + + 

Sbjct: 569 NIEAQKTIKKQSEQLKILQASLEDTQRQLQQTLDQY AL AQRKVS AL S A - E L E EC K V 623 

Query: 359 LQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQ 401 

DA R+ ++ +E+ 4 V +L +K+ ++ + 

Sbjct: 624 ALDNAIRARKQAEIDLEEANGRITDLVSVNNNLTAIKNKLETE 666 



Pedant information for DKFZphtes3_7b22, frame 2 



Report for DKFZphtes3_7b22 .2 



[LENGTH] 44 3 

[MW] 51917.95 

[pi] 6.18 

[HOMOL] PIR:S28589 trichohyalin - rabbit 2e-08 

[FUNCATJ 30.03 organization of cytoplasm [S. cerevisiae, YDLOSBw] 7e-07 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

7e-07 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 
jannaschii, MJ1322] 5e-06 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YPR141c] le-05 

[ FUNCAT) 11.01 stress response [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPRl41c) le-05 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YPRl41c] le-05 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YPR141c) le-05 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YOR216c] 3e-05 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] 6e-05 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR095w] 6e-05 

[FUNCAT] 30.02 organization of plasma membrane [S. cerevisiae, YER008c] le-04 

[FUNCAT] 08.16 extracellular transport [s. cerevisiae, YEROOSc] le-04 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YER008c] 

le-04 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] 2e-04 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YDL207w] 4e-04 

[FUNCAT] 04.07 rna transport [S. cerevisiae, YDL207w] 4e-04 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YKL201c] 5e-04 

[EC] 3.6.1.32 Myosin ATPase 3e-08 

[PIRKW] phosphotransferase 6e-06 

[PIRKW] citrulline 8e-06 

[PIRKW] tandem repeat le-07 

[PIRKW] heart 6e-06 

[PIRKW] polymorphism 4e-06 

[PIRKW] serine/threonine-specific protein kinase 6e-06 

[PIRKW] DNA binding 8e-08 



934 



WO 01/12659 



PIRKW] 


muscle contraction le-07 


PIRKW] 


actin binding 3e-08 


PIRKW] 


ATP 3e-08 


PIRKW] 


thick filament le-07 


PIRKW] 


nhosohoorotein 3e— 08 


PIRKW] 


glycoprotein 4e-06 


PIRKW] 


skeletal muscle le-07 


PIRKW] 


calcium binding 8e-06 


PIRKW] 


alternative splicing 3e-08 


PIRKW] 


coiled coil 3e-08 


PIRKW] 


P-loop 3e-08 


PIRKW] 


heptad repeat 4e-06 


PIRKW] 


methylated amino acid 3e—08 


PIRKW] 


basement membrane 4e-06 


PIRKW] 


cardiac muscle 6e~ 06 


PIRKW] 


extracellular matrix 4e-06 


PIRKW] 


hydrolase 3e-08 


PIRKW] 


membrane protein 4e— 06 


PIRKW] 




PIRKW] 




PIRKW] 


ha i y ftp— Cl fi 


SUPFAM1 


Illy UOAII llCuv y ^ilOJLIl JC \J o 


jure nri j 


uiiasDi^iicu OCX/ lliL vL i yL s^ctiiiL piULCin Miiascs ve 


SUPFAM] 


calmodulin r^tieat homoloou 9e—06 


SUPFAM] 


myosin motor domain homology 3e-08 


SUPFAM] 


trichohyalin 8e-06 


SUPFAM] 


protein kinase homology 6e-06 


PROSITE] 


AM I DAT I ON 2 


PROSITE] 


CAMP PHOSPHO SITE 1 


PROSITE] 


CK2 PHOSPHO SITE 12 


PROSITE] 


TYR~PHOSPHO SITE 2 


PROSITE] 


PKC PHOSPHO SITE 4 


PROSITE} 


ASN GLYCOSYLATION 1 


KW] 


All Alpha 


KW] 


LOW COMPLEXITY 10.61 % 



SEQ MEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGIPKETDIEIIPEIPETLEPLSLPD 

SEG xxxxxxxxxxxxxxxxxxxxxxx. 

PRD cccccccccccccccccceeeeeccccccceeeeecccccceeeeeeccccccccccccc 

SEQ VLRI S AVLEDTTDQLS I LN YIMPVQYEGRQS I CVKSREMNLEGTNLDKLPMASTITKI PS 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DTIKELQDSATYNSLLQALSKERENKMHFYDIIAREEKGRKQIISLQKQLINVKKEWQFE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DLAKMI REYEQVI I EDRI EKERSKKKVKQDLLELKSVI KLQAWWRGTMI RREIGGFKMPK 

SEG x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ DKVDSKDSKGKGKGKDKRRGKKK 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccc 



Prosite for DKFZphtes3_7b22 .2 



PS00001 


285->289 


PS00004 


152->156 


PS00005 


164->167 


PS00005 


182->185 


PS00005 


280->283 


PS00005 


383->386 


PS00006 


5->9 


PS00006 


30->34 



ASN GLYCOSYLATION 
CAMP_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
CK2 PHOSPHO_SITE 
CK2~PH0SPH0 SITE 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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rauuuyv 


4 1 


->45 


CK7 PHOSPHO 
\«f\£ rnwrny 


SITE 


PDOC00006 


nefl nnnfi 


51 


f ->61 


CK2 PHOSPHO~ 


SITE 


PDOC00006 


f jUUUUO 


104- 


•>108 




SITE 


PDOC00006 


rovuuu o 


182- 


■>186 


rKP^PHO^PHO" 


SITE 


PDOC00006 


PS00006 


243- 


■>247 


CK2~PH0SPH0* 


"site 


PDOC00006 


PS00006 


262- 


•>266 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


271- 


■>275 


CK2~PH0SPH0" 


"site 


PDOC00006 


PS00006 


302- 


>306 


CK2~PH0SPH0* 


"site 


PDOC00006 


PS00006 


308- 


■>312 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


310- 


■>314 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


261- 


■>269 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


184- 


■>193 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00009 


218- 


■>222 


AM ID ATI ON 




PDOC00009 


PS00009 


439- 


•>443 


AMIDATION 




PDOC00009 



(NO Pfara data available for DKFZphtes3_7b22.2) 
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DKFZphtes3_7dl7 



group: testes derived 

DKFZphtes3_7dl7 encodes a novel 633 amino acid protein with weak similarity to human KIAA0454. 
Pfam predicts a TNFR/NGFR cysteine-rich region. 

No informative BLAST results; No predictive prosite or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to KIAA0454 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3608 bp 

Poly A stretch at pos . 3587, polyadenylation signal at pos. 3570 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



GGGAAGTTAC 
AATCCTGTTT 
GCCAATTGTC 
TGTGCTCTTG 
GCTTCCAGTG 
GCACCAAGAG 
TCCAACAGTC 
TATCTGCCGG 
ATCAACAAGA 
AAACCTCAAA 
ACCGGCAAAA 
ATGCTGAGGG 
CGGGCAAGCT 
AACGAGAGCT 
TCCCGCTCAT 
GGACAACTCC 
GGCTGGCACA 
GAGGATGAAG 
TGCCCCCAGG 
CACTGGAGGA 
TCCAACCAGC 
CGACTCAACT 
TATGCATTAT 
GGGCCAGTGT 
CCAGGAGTCC 
TGTCTGCCTC 
CAGCAAGTCG 
GAAAAAGGAG 
TGGATGAGAA 
TCAACTCCTT 
AAGTGACTTT 
TGGACAGAAT 
TGCCCCAGGC 
GCAGGACTCA 
TGCCTGATTC 
GAACACGTTG 
AGGGGAAGAA 
TGGAAGCAGA 
TCGACTACTT 
AAGTGCCTTT 
TGGACAATAG 
CAGATGGGAG 
TGTCATTGCT 
TGTAGTTCCC 
CTATTCCTAT 
ACGTTGGACC 
GCCGGGAGTG 
AGCTACAAAG 
CTATACCTGC 
TGTTACCCTG 
TGGTTTGTTT 
GGATTGTATT 
GGCTCCTTTG 
ATACTTAGGA 



GGCGAAGTCC 
AGACCCAGGC 
CCTTGCCGTC 
GCCTCCACAC 
GACTCTCCGA 
CAGCCTCACA 
CCTGGCCCCA 
CCCTTGGTCC 
AATCGCGCCC 
CAGAAATGTC 
TAATTACGAC 
ATGAGCGGCT 
GAGGAGCTCA 
GACCCAGTTA 
TGAATCAGCA 
CAGGGACGGG 
GCACCTCGTC 
ATGTTAAAGT 
GAGGTGCAGA 
GTGTGCCATC 
CTTACGGGAA 
CTCATTGACT 
CCCAGAAAAT 
CTCCCAGGAA 
TGGGATGAAG 
ATACCAGTCT 
GCTTGGCTCT 
GACCAAGAGG 
AGAGCCTGAA 
TTGAGTACCT 
TACTCATTGC 
GAAAAAGGAC 
TCAGCAGAGA 
CTGGATAGAT 
ATGCCAGCCC 
GCTTTTCTCT 
GATCAAAAGC 
AGAGCCTGAA 
CAACTTACTT 
TACTCATTTG 
GTTTTTTACT 
TCATATTCCC 
GCAGGCAGGA 
TTTGGAAGCC 
TCTCAGACCA 
CAAGTTAGGT 
ATCTGCCAGA 
TTCCTCAGGG 
TCAAGGTCAG 
GTTTCATTGA 
TAGCTGATCC 
TCAGAACCAC 
AGTAGAGAAG 
AGACCACAGC 



ACCCAGCGTT 
GAAGGTTCCT 
CTCCTGAGGG 
TGGGGATGCC 
GGCCCTGATG 
TGGTGTGGGC 
CCTCTTCTGC 
GGTGAGAAGG 
CCAGCTGGCA 
TTGTAACTCA 
TATGAAGACT 
GCTCACAGAA 
GGCAATATAA 
AGGGAGAAGT 
TCTCCAGGCC 
ACCTCCGAGA 
CAAAAGCTCA 
TGAGGAGGCT 
AGGCTGAAGA 
ACTTGTTCAA 
CACCAGAATC 
CATCCTCTCA 
GAAAGTGATC 
TCTGCAGGAG 
GTGATTGGAC 
GACAGGAGCA 
TGACATAGGC 
CCACAAGTCC 
GTCTTGCAGG 
GGAACTGCCT 
AGGAACAACA 
CAAGAAGAGG 
GCTGCCGGAG 
GGTATTCGAC 
TACGGAAGTT 
TGACGTGGAT 
CACCATGCCC 
GTCTTGCAGG 
TCAACTACAT 
AGGAACAGGA 
TTGACAGTGA 
ACACTAAGCA 
CCTATAGGCA 
CAGTCATAGG 
TGCCAGTGGC 
GTGACACGTT 
CATTCTAATT 
GTTTCATTTT 
TGTCATCTTT 
ACCTAACCCC 
ATCTGTAACA 
TGACTGCTCT 
CCATAGTCCT 
TAGACGGACA 



TCTCAGGCAA 
GGTGACCCAG 
TATCTGGAGC 
ACTGACTCCC 
TAGAAACTTC 
CGACATCAAG 
CACAAACGTC 
CAGAGATGAA 
GAGAACAAAC 
AGTGGCCTAC 
GCAAAGACCT 
GAGAAGCTTG 
AGTCCTGGTT 
TACAGGAAGG 
CTCCTCACTC 
ACAGCTGGCT 
GCCCAGAAAA 
GAGAAAGTAC 
AAAGGAAGTC 
ATAGCCACCA 
ACATTTGAGG 
TGATGAATGG 
ATGAGCAAGA 
TCTGAAGAGG 
TCTCTCAATT 
CCTTTCACTC 
AGACATTGGT 
CAGGCTCAGC 
ACTCACTGGA 
GACTTATGCC 
CCTTGGCTTG 
AAGAAGACCA 
GTAGTAGAGC 
TCCTTTCAGT 
GCTTTTACTC 
GAAATTGAAA 
CAGGCTCAAC 
ACTCACTGGA 
GCCTCATTCC 
CGTCAGCTTG 
TAAGGCACCA 
GCCCTTACTA 
CATGTAGGTT 
ATGGGAAAGT 
CACCTGTGCT 
CACACGACTA 
TGAACCAGAT 
GCAGGCATGT 
GTGTTTAGCT 
ATTCTTTGTA 
CAGGAGGGAT 
TGACAGTTGT 
TCAGCCTCCA 
AACAGCATTG 



TCTGAAGGCA 
GCTCTCACCA 
TTCAGTGCTG 
ACTGTCCAGG 
CCCATTCGGT 
AGCTGCGAGA 
AGCATGGTGG 
CATTCTAGAA 
AGCAGTTCAG 
TTCCTGGCCA 
CATAAAATCT 
CAGAGGAGCT 
CACTCTCAGG 
GAGAGATGCC 
CGGATGAGCC 
GAGGGATGTA 
TGATGACGAT 
AGGAATTATA 
CCTGAGGACT 
CCCTTGTGAG 
AAGACCAAGT 
TTGGATGCTG 
GGAAGAAAAA 
AGGAAGCCCC 
CCTCCTGACA 
AGTAGAGGAA 
GTGATCAAGT 
AGGGAGCTGC 
TAGATTTTAT 
AGCCCTACAG 
GCTCTTGACT 
AGGCCCACCA 
CTGAGGACTT 
TATCCAGAAC 
ATTGGAGGAA 
AGTACCAAGA 
GAGGTGCTGA 
TAGATGTTAT 
AGCAGTACAG 
GCCCTTGACG 
CCTGGCCTTC 
AGCTGAGAGA 
TGAATGAAAC 
GGGCATGGCT 
CAGTCTGAAG 
TGTAGCACAT 
ATCTCTGGGT 
CTCTGAGCTT 
CATCCAAAGG 
TCTTCAGTGT 
CCTTGGCTGA 
TAACCCACTA 
ATTGATATCA 
GGAGGCCTTA 
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2701 GTCCTGCTCC TTTCAATTCC ATCCTGTAAA GAACAGGAGT CAGGAGCCGC 
2751 TGGCAAGAGA CAGCATGTCA CCTGGGACTC TGCCAGTGCA GAATATGAAC 
2801 AATGCCATGT TCTTGCAGAA AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 
2851 TCACCAGACA ACTGCAGAAT GTAGAACACT GAGCAGGACA ACTGACCTGT 
2901 CTCCTTCACA CAGTCCACGT CACCACGAAT CACACAACAA AAAGGAGGAG 
2951 AGATATTTTG GGTTCAGAAG AAGTAAATGA TAATGTAGCT ACATTTCTTT 
3001 AGTTATTTTG AACCCCAAAT ATTTCCTCAT CTTTTTGTTG TTGTCATTGA 
3051 TTTTGGTGAC ATGGACTTGT TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 
3101 ATGGTCTACA TTCTGAAGTT GTCTGAAAAT GTCTTCATGA TTAAATTCAG 
3151 CCTAAACGTT TCATCAAGAA CACTACAGAG TCGATACTGT GAGTTTCCAA 
3201 CCTCAGCCCA TCTGTGGGCA GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 
3251 CATGATATCA GGACTGGTTA CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 
3301 CCCTTTTAGA GACACCTTAC TTATGATGAA GTATTTGGGA GAGTGGTTTT 
3351 TCAAAGTAGA AATGTCCTGT ATTCCAGTGA TCATCCTCTA AACGTTTTAT 
3401 CATTTATTAA TCATCCCTGC CTGTGTCTAT TATTATATTC ATATCTCTAC 
3451 GCTGGAAATT TGCTGCCTCA ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 
3501 TGTGTTGTTG AAAAAAAAAC ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 
3551 AAGTTATTTT AATCTATACA ATTAAAAACT TTTGCCTATC AAAAAAAAAA 
3601 AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 17 6 bp to 2074 bp; peptide length: 633 
Category: similarity to known protein 



1 MPLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 
51 SATNVSMVVS AGPWSGEKAE MNILEINKKS RPQLAENKQQ FRNLKQKCLV 
101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGQAEELRQ 
151 YKVLVHSQER ELTQLREKLQ EGRDASRSLN QHLQALLTPD EPDNSQGRDL 
201 REQLAEGCRL AQHLVQKLSP ENDDDEDEDV KVEEAEKVQE LYAPREVQKA 
251 EEKEVPEDSL EECAITCSNS HHPCESNQPY GNTRITFEED QVDSTLIDSS 
301 SHDEWLDAVC IIPENESDHE QEEEKGPVSP RNLQESEEEE APQESWDEGD 
351 WTLSIPPDMS ASYQSDRSTF HSVEECX1VGL ALDIGRHWCD QVKKEDQEAT 
401 SPRLSRELLD EKEPEVLQDS LDRFYSTPFE YLELPDLCQP YRSDFYSLQE 
451 QHLGLALDLD RMKKDQEEEE DQGPPCPRLS RELPEWEPE DLQDSLDRWY 
501 STPFSYPELP DSCQPYGSCF YSLEEEHVGF SLDVDEIEKY QEGEEDQKPP 
551 CPRLNEVLME AEEPEVLQDS LDRCYSTTST YFQLHASFQQ YRSAFYSFEE 
601 QDVSLALDVD NRFFTLTVIR HHLAFQMGVI FPH 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7dl7, frame 2 

PIR:T00069 hypothetical protein KIAA0454 - human (fragment), N = 1, 
Score « 199, P - le-11 

PIR:A4 5592 liver stage antigen LSA-1 - Plasmodium falciparum, N - 1, 
Score = 158, P = 2.7e-07 



>PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 
Length = 1,882 

HSPs: 

Score - 199 (29.9 bits), Expect - 1.0e-ll, P = 1.0e-ll 
Identities - 74/261 (28%), Positives = 122/261 (46%) 

Query: 117 EDCKDLIKSMLRDERLLT EEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEG 172 

+D + LI+ + + E L EEKLAEEL A +Y L+ Q REL+ LR+K++EG 

SbjCt: 964 KDLESLIQRVSQLEAQLPKNGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query: 


173 


Sbjct: 


1024 


Query: 


226 


Sbjct : 


1084 


Query: 


285 


Sbjct: 


1140 


Query': 


343 


Sbjct: 


1197 


Score 


= 89 


Identities 1 


Query: 


464 


Sbjct: 


1079 


Query: 


519 


Sbjct: 


1139 


Score 


« 73 


Identities « 


Query: 


390 


Sbjct: 


10B0 


Query: 


445 


Sbjct: 


1140 


Score 


- 68 


Identities ■• 


Query: 


31 


Sbjct: 


684 


Query: 


80 


Sbjct: 


744 


Query: 


138 


Sbjct: 


804 


Score 


- 65 


Identities « 


Query: 


123 


Sbjct: 


5 


Query: 


179 


Sbjct: 


61 


Score 


= 61 


Identities 5 


Query: 


134 


Sbjct: 


855 


Query: 


189 


Sbjct: 


913 


Score 


= 57 


Identities * 


Query: 


127 


Sbjct: 


358 



E + 



+ +H 



+E 



-LQALLTPDEFDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 225 
+ LL ++ D G+ REQLA+G +L + L KLS ++ 



RE+Q+ E+ EV + L+ ++T S+SH +S++ 



+T 



+E + 



+H E 



P + +S + S + A 

-PSHSDSIHHSSHSAVLSSKPSSTSASQGAK 1196 



ES + 



+L P + 



S FH 



13.4 bits), Expect « l.le-01, P = 1.0e-01 
35/89 {39%), Positives = 44/89 (49%) 



KD + E+DQ 



F S E E 



RLSREL E + E LQ LD 



TP S L DS + P + 



D+D + +Y 



EE + 



Expect 



4.86+00, P » 9.9e-01 
ves « 40/88 (45%) 



fDQ 



--RLSRELLD-EKEPEVLQDSLDRFYSTPFEYLELPDLCQ-PYRSD 444 
RLSREL + EK EVLQ LD TP L D + P + 



F S 



D+D + + 



EE + P 



L0.2 bits), Expect - l.le-01, P - 1.0e-01 
36/156 (23%), Positives = 68/156 (43%) 



SHGVGRHQELRDPTV PGPTSSATNVSMVVSAGPWS- 

S G +HQE + TV PPS + V A 



R QL++ KQ++++L++K L+++ F AN 



-GEKAEMNILEINKK 79 
G ++ ++ + 



+ L+K 



E G++E 



L+E L EG 



[9.8 bits), Expect - 2.2e-01, P = 2.0e-01 
» 23/96 (23%), Positives = 52/96 (54%) 

IKSMLRDERLLTEEKLAEELGQAEE LRQYKVLVHSQERELTQLREKLQEGRDASRS 178 

++ + D+ + E + E+ EE LRQ ++ V ++ +L +LR+ L ++ + 

LRQRIHDKAVALERAIDEKFSALEEKEKELRQLRLAVRERDHDLERLRDVLS SNEA 60 



Q +++LL ++G ++ EQL+ C+ Q L +++ 

TMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 9 3 

[9.2 bits), Expect = 5.5e-01, P » 4.2e-01 
> 27/95 (28%), Positives = 47/95 (49%) 



+E K L +LG+ EE R Y 



+S 



R+ A G 



KVLVHSQERELTQLREKLQEGRDASRSLNQHLQALLT 188 

+LV +++ L+ +LQ ++L +++L 



++ SP + DEDE 
-LEGSSPHSVPDEDE 945 



L E LL EK+A 



Q +E+ 



— RQYKVLVHSQERELTQLREKLQEGRDASRSLNQHL 183 
R+ ++L+ + L R+LE ARL L 
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Query: 184 QALLTPDEPDNSQ-GRDLREQLAEGCRLAQHLVQKL 218 

P++S+ R L+ +L EG ++ + ++++ 
Sbjct: 416 VKFHA--HPESSERDRTLQVEL-EGAQVLRSRLEEV 448 

Score = 54 (8.1 bits), Expect => 2.7e+00, P = 9.3e-01 
Identities = 61/264 (23%), Positives * 121/264 (45%) 



Query: 


3 


LTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQE — LRDPTVPGPTSSATNVSMVVS 


60 






L+ T Q QW L+ ++ET F + + + + L D SAT ++ 




Sbjct: 


79 


LSTTCQNLQW-LK-EEMETK-FSRWQKEQESIIQQLQTSLHDRNKEVEDLSAT LLCK 


132 


Query: 


61 


AGPWSGEKAEMNILEINKKSR PQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYE 


117 






GP E AE + +K R L++ +Q L+ + + + ++ R+ 




Sbjct: 


133 


LGPGQSEIAEELCQRLQRKERMLQDLLSDRNKQV — LEHEMEIQGLLQSVSTREQE-SQA 


189 


Query: 


118 


DCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELT QLREKLQEG — 


172 






+ L+++++ ER +L+LG+L ++ +Q+ E+T +L ++ +G 




Sbjct: 


190 


AAEKLVOALM — ERNSELQALRQYLGGRDSLMS-QAPISNQQAEVTPTGRLGKQTDQGSM 


246 


Query: 


173 


RDASRSLNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKV 


232 






+ SR + L A P ++ G DL + +A G L ++LS N +E E + 




Sbjct: 


247 


QIPSRDDSTSLTAKEDVSIPRSTLG-DL-DTVA-G LEKELS — NAKEELELMAK 


295 


Query: 


233 


EEAEKVQELYAPREVQKAEEKEVPEDSLEECAIT 266 








+E E EL A + + +E+E+ + + ++T 




Sbjct: 


296 


KERESQMELSALQSMMAVQEEELQVQAADMESLT 329 




Score 


= 49 


(7.4 bits), Expect = 6.3e+00, P = 1.0e+00 




Identities ! 


- 21/87 (24%) , Positives = 39/87 (44%) 




Query: 


192 


PDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQELYAPREVQKAE 


251 






P ++Q LR QL++ + Q L +KL + +EEK++ +K+ 




Sbjct: 


738 


PGSTQ— HLRSQLSQCKQRYQDLQEKLLLS— EATVFAQANELEKYRVMLTGESLVKQD 


792 


Query: 


252 


EKEVPEDSLEECAI -TCSNSHHPCESNQ 278 








K++ D L++ TC S + E + 




Sbjct: 


793 


SKQIQVD-LQDLGYETCGRSENEAEREE 819 




Score 


= 46 


(6.9 bits), Expect = 6.3e+00, P = 1.0e+00 




Identities * 


= 19/77 (24%), Positives = 39/77 (50%) 




Query; 


112 


NNYDYEDCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQ- 


170 






+ ++ E+ K+ K + E ++T+E L+E QAE R+ + + + ■♦■ L+E+L 




Sbjct: 


597 


DGWEIEEDKE— KGEVMVETWTKEGLSESSLQAE-FRKLQGKLKNAHNIINLLKEQLVL 


653 


Query: 


171 


EGROASRSLNQHLQALLT 188 








++ + L L LT 





Sbjct: 654 SSKEGNSKLTPELLVHLT 671 

Pedant information for DKFZphtes3_7dl7, frame 2 



Report for DKFZphtes3_7dl7 .2 



[LENGTH] 633 

[MW] 72951.15 

tpl] 4.40 

[HOMOL] PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 2e-ll 

(BLOCKS] BL00201E 

tPROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 14 

[PROSITEJ PKC_PH0SPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATION 2 

[PFAM] TNFR/NGFR cysteine-rich region 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4.90 % 

[KW] COILED_COIL 6.95 % 



SEQ MPLTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQELRDPTVPGPTSSATNVSMWS 

SEG 

PRD ccccceeeeeeeecccccccccccccccccccccccccccccccccccccceeeeeeeee 

COILS 

SEQ AGPWSGEKAEMNILEINKKSRPQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYEDCK 

SEG 

PRD ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhhhcccccccccch 

COILS 
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SEQ DLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEGRDASRSLN 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQE 

SEG xxxxxxxxxxxxxxxx. . 

PRD hhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhh 

COILS CCCCCCC 

SEQ LYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLIDSS 

SEG 

PRD hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccceeeeecccccccccccc 

COILS 

SEQ SHDEWLDA VCII PENES DHEQEEEKGPVSPRNLQESEEEEAPQESWDEGDWTLS I PPDMS 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhheeeccccccchhhhhhcccccccccchhhhhhhccccccccccccccccccc 

COILS 

SEQ ASYQSDRSTFHSVEEQQVGLALDIGRHWCDQVKKEDQEATSPRLSRELLDEKEPEVLQDS 

SEG 

PRD ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccchhhhhhhhhhhheeeecc 

COILS 



SEQ LDRFYSTPFEYLELPDLCQPYRSDFYSLQEQHLGLALDLDRMKKDQEEEEDQGPPCPRLS 

SEG 

PRD hhhhhccceeeeecccccccccccchhhhhhhhhhhhhcchhhhhhhhhhcccccccccc 

COILS 

SEQ RELPEVVEPEDLQDSLDRWYSTPFSYPELPDSCQPYGSCFYSLEEEHVGFSLDVDEIEKY 

SEG 

PRD ccceeeeeccchhhhhhhhhccccccccccccccccccceeeeccceeeccccchhhhhh 

COILS 

SEQ QEGEEDQKPPCPRLNEVLMEAEEPEVLQDSLDRCYSTTSTYFQLHASFQQYRSAFYSFEE 

SEG 

PRD hcccccccccccchhhhhhhhhchhhhhccccceeecceeeehhhhhhhhhhhhhhhhhc 

COILS 



SEQ QDVSLALDVDNRFFTLTVIRHH L AFQMGV I FPH 

SEG 

PRD cchhhhhhcccchhhhhhhhhhhhhhhhhcccc 

COILS 



Prosite for DKFZphtes3_7dl7 . 2 



PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS0OO06 
PS0OOO6 
PS0OOO6 
PS00006 
PS00008 
PS00008 



54->58 
315->319 

13->16 
329->332 
365->368 
401->4O4 
188->192 
259->263 
286->290 
295->299 
300->304 
317->321 
336->340 
345->349 
372->376 
427->431 
447->451 
505->509 
522->526 
597->601 

25->31 
207->213 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 

pkc_phospho_site 
pkc_phospho site 
pkc_phospho~*site 
pkc_phospho~site 
ck2_phospho_site 
ck2_phospho_site 
ck2 phospho site 
ck2~phospho~site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2_phospho_site 
ck2 phospho site 
ck2~phospho~site 
ck27phospho_site 

CK2 PHOSPHO SITE 

ck2'phospho"site 
ck2_phospho*~site 
myristyl 
myristyl 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



Pram for DKFZphtes3_7dl7.2 
HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeGtYtDWNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

C+ ++ + N+ ++ + ++ + +++ +++ ++VC 

Query 274 CESNQPYG-NT-RITFEEDQVDS — TLIDSSSHDEWLDAVC 310 
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DKFZphtes3_7j3 



group: cell cycle 

DKFZphtes3_7j3.2 encodes a novel 628 amino acid putative protein kinase, which is related to 
the c-TAKl~Cdc25C associated protein kinase. 

Cdc25C is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2. 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
mediates the binding of 14-3-3 protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein 
kinase) phosphorylates Cdc25C on serine 216 in vitro. The new protein is closely related to C- 
Takl and therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to serine/threonine-specif ic protein kinases 

complete cDNA, complete cds, potential start at Bp 128, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3443 bp 

Poly A stretch at pos. 3399, polyadenylation signal at pos. 3376 



1 GTGCTTTACT GCGCGCTCTG GTACTGCTGT GGCTCCCCGT CCTGGTGCGG 
51 GACCTGTGCC CCGCGCTTCA GCCCTCCCCG CACAGCCTAC TGATTCCCCT 
101 GCCGCCCTTG CTCACCTCCT GCTCGCCATG GAGTCGCTGG TTTTCGCGCG 
151 GCGCTCCGGC CCCACTCCCT CGGCCGCAGA GCTAGCCCGG CCGCTGGCGG 
201 AAGGGCTGAT CAAGTCGCCC AAGCCCCTAA TGAAGAAGCA GGCGGTGAAG 
251 CGGCACCACC ACAAGCACAA CCTGCGGCAC CGCTACGAGT TCCTGGAGAC 
301 CCTGGGCAAA GGCACCTACG GGAAGGTGAA GAAGGCGCGG GAGAGCTCGG 
351 GGCGCCTGGT GGCCATCAAG TCAATCCGGA AGGACAAAAT CAAAGATGAG 
401 CAAGATCTGA TGCACATACG GAG GG AG ATT GAGATCATGT CATCACTCAA 
451 CCACCCTCAC ATCATTGCCA TCCATGAAGT GTTTGAGAAC AGCAGCAAGA 
501 TCGTGATCGT CATGGAGTAT GCCAGCCGGG GCGACCTTTA TGACTACATC 
551 AGCGAGCGGC AGCAGCTCAG TGAGCGCGAA GCTAGGCATT TCTTCCGGCA 
601 GATCGTCTCT GCCGTGCACT ATTGCCATCA GAACAGAGTT GTCCACCGAG 
651 ATCTCAAGCT GGAGAACATC CTCTTGGATG CCAATGGGAA TATCAAGATT 
701 GCTGACTTCG GCCTCTCCAA CCTCTACCAT CAAGGCAAGT TCCTGCAGAC 
751 ATTCTGTGGG AGCCCCCTCT ATGCCTCGCC AGAGATTGTC AATGGGAAGC 
801 CCTACACAGG CCCAGAGGTG GACAGCTGGT CCCTGGGTGT TCTCCTCTAC 
851 ATCCTGGTGC ATGGCACCAT GCCCTTTGAT GGGCATGACC ATAAGATCCT 
901 AGTGAAACAG ATCAGCAACG GGGCCTACCG GGAGCCACCT AAACCCTCTG 
951 ATGCCTGTGG CCTGATCCGG TGGCTGTTGA TGGTGAACCC CACCCGCCGG 
1001 GCCACCCTGG AGGATGTGGC CAGTCACTGG TGGGTCAACT GGGGCTACGC 
1051 CACCCGAGTG GGAGAGCAGG AGGCTCCGCA TGAGGGTGGG CACCCTGGCA 
1101 GTGACTCTGC CCGCGCCTCC ATGGCTGACT GGCTCCGGCG TTCCTCCCGC 
1151 CCCCTCCTGG AGAATGGGGC CAAGGTGTGC AGCTTCTTCA AGCAGCATGC 
1201 ACCTGGTGGG GGAAGCACCA CCCCTGGCCT GGAGCGCCAG CATTCGCTCA 
1251 AGAAGTCCCG CAAGGAGAAT GACATGGCCC AGTCTCTCCA CAGTGACACG 
1301 GCTGATGACA CTGCCCATCG CCCTGGCAAG AGCAACCTCA AGCTGCCAAA 
1351 GGGCATTCTC AAGAAGAAGG TGTCAGCCTC TGCAGAAGGG GTACAGGAGG 
1401 ACCCTCCGGA GCTCAGCCCA ATCCCTGCGA GCCCAGGGCA GGCTGCCCCG 
1451 CTGCTCCCCA AGAAGGGCAT TCTCAAGAAG CCCCGACAGC GCGAGTCTGG 
1501 CTACTACTCC TCTCCCGAGC CCAGTGAATC TGGGGAGCTC TTGGACGCAG 
1551 GCGACGTGTT TGTGAGTGGG GATCCCAAGG AGCAGAAGCC TCCGCAAGCT 
1601 TCAGGGCTGC TCCTCCATCG CAAAGGCATC CTCAAACTCA ATGGCAAGTT 
1651 CTCCCAGACA GCCTTGGAGC TCGCGGCCCC CACCACCTTC GGCTCCCTGG 
1701 ATGAACTCGC CCCACCTCGC CCCCTGGCCC GGGCCAGCCG ACCCTCAGGG 
1751 GCTGTGAGCG AGGACAGCAT CCTGTCCTCT GAGTCCTTTG ACCAGCTGGA 
1801 CTTGCCTGAA CGGCTCCCAG AGCCCCCACT GCGGGGCTGT GTGTCTGTGG 
1851 ACAACCTCAC GGGGCTTGAG GAGCCCCCCT CAGAGGGCCC TGGAAGCTGC 
1901 CTGAGGCGCT GGCGGCAGGA TCCTTTGGGG GACAGCTGCT TTTCCCTGAC 
1951 AGACTGCCAG GAGGTGACAG CGACCTACCG ACAGGCACTG AGGGTCTGCT 
2001 CAAAGCTCAC CTGAGTGGAG TAGGCATTGC CCCAGCCCGG TCAGGCTCTC 
2051 AGATGCAGCT GGTTGCACCC CGAGGGGAGA TGCCTTCTCC CCCACCTCCC 
2101 AGGACCTGCA TCCCAGCTCA GAAGGCTGAG AGGGTTTGCA GTGGAGCCCT 
2151 GAGCAGGGCT GGATATGGGA AGTAGGCAAA TGAAATGCGC CAAGGGTTCA 
2201 GTGTCTGTCT TCAGCCCTGC TGAACGAAGA GGATACTAAA GAGAGGGGAA 
2251 CGGGAATGCC CGCGACAGAG TCCACATTGC CTGTTTCTTG TGTACATGGG 
2301 GGGGCCACAG AGACCTGGAA AGAGAACTCT CCCAGGGCCC ATCTCCTGCA 
2351 TCCCATGAAT ACTCTGTACA CATGGTGCCT TCTAAGGACA GCTCCTTCCC 
2401 TACTCATTCC CTGCCCAAGT GGGGCCAGAC CTCTTTACAC ACACATTCCC 
2451 GTTCCTACCA ACCACCAGAA CTGGATGGTG GCACCCCTAA TGTGCATGAG 
2501 GCATCCTGGG AATGGTCTGG AGTAACGCTT CGTTATTTTT ATTTTTATTT 
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2551 TTATTTATTT ATTTATTTTT TTGAGACGGA GTTTCGCTCT TGGTGCCCAG 

2601 GCTAGAGTGC AATGGCGCGA TCTCAGCTCA CCTCAACCTC CGCCTCCCGG 

2651 GTTCAAGCGA TTCTCCTGCC TCAGCCTCCC TAGTAGCTGG GATTACAGGC 

2701 GCCCGCCACC ATGCCCGGCT AATTTTGTAT TTTTAGTAGA GACAGGGTTT 

2751 CTCCATGTTG GTCAGGCTGG TCTCAAACTC CCGACCTCAG GTGATCCACC 

2801 CACCTCGGCC TCCCAAAGTG CTGGGATTAC AGGCGTGAGC CACCGCGCCC 

2851 CACCTAACCC TTCCTTATTT AGCCTAGGAG TAAGAGAACA CAATCTCTGT 

2901 TTCTTCAATG GTTCTCTTCC CTTTTCCATC CTCCAAACCT GGCCTGAGCC 

2951 TCCTGAAGTT GCTGCTGTGA ATCTGAAAGA CTTGAAAAGC CTCCGCCTGC 

3001 TGTGTGGACT TCATCTCAAG GGGCCCAGCC TCCTCTGGAC TCCACCTTGG 

3051 ACCTCAGTGA CTCAGAACTT CTGCCTCTAA GCTGCTCTAA AGTCCAGACT 

3101 ATGGATGTGT TCTCTAGGCC TTCAGGACTC TAGAATGTCC ATATTTATTT 

3151 TTATGTTCTT GGCTTTGTGT TTTAGGAAAA GTGAATCTTG CTGTTTTCAA 

3201 TAATGTGAAT GCTATGTTCT GGGAAAATCC ACTATGACAT CTAAGTTTTG 

3251 TGTACAGAGA GATATTTTTG CAACTATTTC CACCTCCTCC CACAACCCCC 

3301 CACACTCCAC TCCACACTCT TGAGTCTCTT TACCTAATGG TCTCTACCTA 

3351 ATGGACCTCC GTGGCCAAAA AGTACCATTA AAACCAGAAA GGTGATTGGA 

3401 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



98202387: 

C-TAK1 protein kinase phosphorylates human Cdc25C on serine 216 and 
promotes 14-3-3 

protein binding. 



Peptide information for frame 2 



ORF from 128 bp to 2011 bp; peptide length: 628 
Category: strong similarity to known protein 



1 MESLVFARRS GPTPSAAELA RPLAEGLIKS PKPLMKKQAV KRHHHKHNLR 
51 HRYEFLETLG KGTYGKVKKA RESSGRLVAI KSIRKDKIKD EQDLMHIRRE 
101 IEIMSSLNHP HIIAIHEVFE NSSKIVIVME YASRGDLYDY ISERQQLSER 
151 EARHFFRQIV SAVHYCHQNR VVHRDLKLEN ILLDANGNIK IADFGLSNLY 
201 HQGKFLQTFC GSPLYASPEI VNGKPYTGPE VDSWSLGVLL YILVHGTMPF 
251 DGHDHKILVK QISNGAYREP PKPSDACGLI RWLLMVNPTR RATLEDVASH 
301 WWVNWGYATR VGEQEAPHEG GHPGSDSARA SMADWLRRSS RPLLENGAKV 
351 CSFFKQHAPG GGSTTPGLER QHSLKKSRKE NDMAQSLHSD TADDTAHRPG 
401 KSNLKLPKGI LKKKVSASAE GVQEDPPELS PIPASPGQAA PLLPKKGILK 
451 KPRQRESGYY SSPEPSESGE LLDAGDVFVS GDPKEQKPPQ ASGLLLHRKG 
501 ILKLNGKFSQ TALELAAPTT FGSLDELAPP RPLARASRPS GAVSEDSILS 
551 SESFDQLDLP ERLPEPPLRG CVSVDNLTGL EEPPSEGPGS CLRRWRQDPL 
601 GDSCFSLTDC QEVTATYRQA LRVCSKLT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7 j3, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7 j3, frame 2 



Report for DKFZphtes3_7 j3 .2 



[LENGTH] 628 

{MW] 69612.39 

tplj 9.01 

IHOMOL] TREMBL:AB011109_1 gene: "KIAA0537"; product: "KIAA0537 protein"; Homo sapiens 
mRNA for KIAA0537 protein, complete cds. le-152 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization IS. cerevisiae, YDR477w] 

5e-66 

[ FUNCAT J 11.01 stress response [S. cerevisiae, YDR477w] 5e-66 
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[FUNCAT] 
[ FUNCAT ] 
[FUNCAT J 
[ FUNCAT } 
8e-52 
{ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
terminal domain] 2e-28 



30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 5e-66 

98 classification not yet clear-cut [S. cerevisiae, YLR096w] 6e-54 
30.02 organization of plasma membrane [S. cerevisiae, YLR096w] 6e-54 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 

03.25 cytokinesis [S. cerevisiae, YDR507c] 8e-52 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 9e-51 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 9e-51 

99 unclassified proteins [S. cerevisiae, YPL141c] le-45 
10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 6e-44 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153cj 6e-44 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPL153c] 6e-44 

03.19 recombination and dna repair [S. cerevisiae, YPLl53cJ 6e-44 
03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 2e-42 
10.02.11 key kinases [S. cerevisiae, YBLlOSc] 3e-34 

04.05.01.04 transcriptional control [S. cerevisiae, YKLl39w CTK1 - carboxy- 



[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YPL031C] 
[FUNCAT] 
5e-24 
[ FUNCAT ] 



5e-24 



03.01 cell, growth [S. cerevisiae, YFK014c] 4e-28 

03.10 sporulation and germination [S. cerevisiae, YGL180w] 2e-26 

06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGL180w] 2e-26 

08.13 vacuolar transport [S. cerevisiae, YGL180w] 2e-26 

04.99 other transcription activities [S. cerevisiae, YERl29w] 4e-26 

02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YPL031c] 



[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
6e-21 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YNL183C] le-17 
[FUNCAT] 
le-17 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT J 
le-15 
[FUNCAT] 
5e-15 
[FUNCAT] 
[FUNCAT] 
YBR097w] 2e-08 
[FUNCAT] 
2e-08 
[FUNCAT] 
2e-08 
[FUNCAT] 
[FUNCAT] 
8e-05 
[FUNCAT] 



03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YHL007c] 6e-24 

10.05.11 key kinases [S. cerevisiae, YHL007c] 6e-24 
09.01 biogenesis of cell wall [S. cerevisiae, YNR031C] le-22 

10.03.11 key kinases [S. cerevisiae, YNR031c] le-22 
03.13 meiosis [S. cerevisiae, YDR523c] 8e-22 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YFL033cJ 6e-21 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 7e-19 
10.04.11 key kinases [S. cerevisiae, YDL159w] 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 



08.99 other intracellular-transport activities 



[S . cerevisiae, YNL183c] 



05.07 translational control [S. 
09.04 biogenesis of cytoskeleton 



cerevisiae, YDR283c] 2e-17 

[S. cerevisiae, YNL020c] 4e-16 



04.03.99 other trna-transcription activities [S. cerevisiae, 

10.04.99 other nutritional-response activities [S. cerevisiae, 



YOR061w] 
YJR059w] 



c energy conversion [M. genitalium, MG109] 3e-12 
30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 
06.04 protein targeting, sorting and translocation 



[S. cerevisiae, YBR097w) 
[S. cerevisiae, YBR097w] 



30.08 organization of golgi [S. cerevisiae, YBR097w] 2e-08 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 



cerevisiae, YHR079c] 8e-05 



01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis 



[S. 



[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[EC] 
[EC] 



BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus le-77 
.1.1.8 MAP kinase p38 [human (Homo sapiens) 4e-68 
.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 2e-85 
.1.1.6 Twitchin, kinase domain [California sea har le-80 
.1.1.5 gamma-subunit of glycogen phosphorylase kinas 2e-76 
.1.2.4 insulin receptor (Human (Homo sapiens) le-69 
.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-84 
.2.3 Fibroblast growth factor receptor 1 [human (Horn le-68 
.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 9e-85 
.2.2 (168-437) c-src tyrosine kinase (human (Horn le-69 
.1.2 cAMP-dependent PK, catalytic subunit [pig (Su le-85 
.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 5e-66 
,1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 9e-47 
.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-75 
.1.1.10 Casein kinase-1, CK1 (rat (Rattus norvegicus) 5e-54 
2.7.1.38 Phosphorylase kinase le-36 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 4e- 
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dlirk 


5. 
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5. 
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5. 
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5. 
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dlfmk 3 


5. 
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dlcdka 
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5. 


1 


dlcsn 


5. 
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dljsua 


5. 
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[EC] 


2.7.1.128 [Acetyl-CoA carboxylase] kinase le-61 


(EC] 


2.7.1.11"? Myosin-light-chain kinase 2e-40 


(EC] 


2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase le-61 


[EC] 


2.7.1.37 Protein kinase 7e-42 


[PIRKW] 


phosphotransferase 6e-66 


[PIRKW] 


nucleus le-64 


[PIRKW] 


calcium 7e-35 


[PIRKW] 


duplication le-38 


[PIRKW] 


tandem repeat 4e-39 


[PIRKW] 


phorbol ester binding le-38 


[PIRKW] 


zinc le-38 


[PIRKW] 


cell cycle control le-42 


[PIRKW] 


serine/threonine-specific protein kinase 8e-68 


[PIRKW] 


oncogene le-40 


[PIRKW] 


phospholipid binding le-38 


[PIRKW] 


autophosphorylation le-64 


[PIRKW] 


brain le-40 


[PIRKW] 


heterotetramer 2e-36 


[PIRKW] 


mitosis 7e-42 


[PIRKW] 


polymer le-35 


[PIRKW] 


magnesium 6e-66 


[PIRKWJ 


ATP 8e-68 


[PIRKW] 


polyprotein le-40 


[PIRKW] 


phosphoprotein le-64 


[PIRKW] 


apoptosis 4e-39 


[PIRKW] 


glycoprotein 7e-42 


[PIRKW] 


leucine zipper 3e-35 


[PIRKW] 


skeletal muscle 7e-35 


[ PIRKW] 


protein kinase 5e— 41 


[PIRKW] 


cAMP binding 3e-38 


[ PIRKW] 


testis 9e-36 


[ PIRKW] 


miri np nnrl pnti dp hi ndina 2p — 49 


[PIRKW] 


calcium binding 8e-39 


[ PIRKW] 


alternative splicing 3e— 37 


[PIRKW] 


P-loop 2e-49 


[PIRKW] 


lipoprotein 2e-33 


[PIRKW] 


segmentation le-33 


[PIRKW] 


core protein le-40 


[PIRKW] 


muscle 7e-35 


[PIRKW] 


myristylation 2e-33 


[PIRKW] 


EF hand 8e-39 


[PIRKW] 


cell division 2e-40 


[PIRKW] 


calmodulin binding 4e-40 


[SUPFAM] 


ribosomal protein S6 kinase II 5e-36 


[SUPFAM] 


fibronectin type III repeat homology 3e-33 


[SUPFAM] 


immunoglobulin homology 3e-33 


[SUPFAM] 


calcium-dependent protein kinase 8e~39 


[SUPFAM] 


AMP-activated protein kinase 6e-66 


[SUPFAM] 


protein kinase akt 3e-42 


[SUPFAM] 


protein kinase SPK1 le-42 


[SUPFAM) 


unassigned Ser/Thr or Tyr-specific protein kinases 8e-68 


[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 3e-37 


[SUPFAM] 


calmodulin repeat homology 8e-39 


[SUPFAM] 


cAMP receptor protein cyclic nucleotide-binding domain homology 6e-33 


[SUPFAM] 


protein kinase C zeta le-36 


[SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic chain le-34 


[SUPFAM] 


death-associated protein kinase 4e-39 


[SUPFAM] 


pleckstrin repeat homology 3e-42 


[SUPFAM] 


ankyrin repeat homology 4e-39 


[SUPFAM] 


protein kinase homology 8e-68 


[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase II 8e-41 


[SUPFAM] 


protein kinase C zinc-binding repeat homology le-38 


[SUPFAM] 


tv/itchin 3e-33 


[SUPFAM] 


protein kinase C delta le-38 


[SUPFAM] 


cGMP-dependent protein kinase 6e-33 


[SUPFAM] 


protein kinase cdrl 7e-42 


[SUPFAM] 


protein kinase C C2 region homology 3e-37 


[SUPFAM] 


protein kinase C alpha 3e-37 


[SUPFAM] 


yeast protein kinase C 5e-36 


[SUPFAM] 


kinase-related transforming protein le-41 


[SUPFAM] 


kinase interaction domain homology le-42 


[SUPFAM] 


gag-akt polyprotein le-40 


[ o u r r t\n j 


Vidi — / LdlinuUUX in UcpcllUCHL ^lUlcXll Kinase 1 *ic 1U 


[SUPFAM] 


protein kinase C mu 4e-33 


[PROSITE] 


PROTEIN KINASE ATP 2 


[PROSITE] 


RGD 1 


[PROSITE] 


MYRISTYL 4 


[PROSITE] 


CAMP PHOSPHO SITE 3 


[PROSITE] 


CK2 PHOSPHO SITE 13 


[PROSITE] 


TYR~PHOSPHO~SITE 2 


[PROSITE] 


PKC_PHOSPHO_SITE 12 
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[PROSITE] ASN_GLYCOSYLATION 2 

[PROSITEJ PROT E I N_K I NAS E_S T 1 

[PFAM] Eukaryotic protein kinase domain 

[KW] All Alpha 

[KW] 3D ~ 

[KW] LOW COMPLEXITY 10.51 % 



SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 



KESLVFARRSGPTPSAAELARPLAEGLIKSPKPLMKKQAVKRHHHKHNLRHRYEFLETLG 

xxxxxxxxxxxx 

HHHHHHHHHHHHHHHCCCCCCCC— GGGEEEEEEEE 

KGT YGKVKKARESSGRLVAI KS I RKDKI KDEQDLMHI RREIEIMS SLNHPH 1 1 AI HEVFE 

CTTTEEEEEEEETTTEEEEEEEEEHHHHHHHCCHHHHHHHHHHHHCCCTTTBCCEEEEEE 

NSSKIVIVMEYASRGDLYDYISERQQLSEREARHFFRQIVSAVHYCHQNRVVHRDLKLEN 

ETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHCCEECCCCCGGG 

ILLDANGNIKIADFGLSNLYHQGKFLQTFCGSPLYASPEIVNGKPYTGPEVDSWSLGVLL 

EEETTTTCEEECCTTTTEET-TTT-BCCCCCCGGGCCHHHHHCCCBC-HHHHHHHHHHHH 

YILVHGTMPFDGHDHKILVKQISNGAYREPPKPSDACGLIRWLLMVNPTRRATLEDVASH 

HHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTCHHHHHHHHHTTTTTGGGTTTHHHHHHC 

WWVNWGYATRVGEQEAPHEGGHPGSDSARASMADWLRRSSRPLLENGAKVCSFFKQHAPG 

GG ! 

GGSTTPGLERQHSLKKSRKENDMAQSLHSDTADDTAHRPGKSNLKLPKGILKKKVSASAE 



GVQEDPPELSPIPASPGQAAPLLPKKGILKKPRQRESGYYSSPEPSESGELLDAGDVFVS 
xxxxxxxxxxxx. . .xxxxxxxxxxxxxxx 



GDPKEQKPPQASGLLLHRKGILKLNGKFSQTALELAAPTTFGSLDELAPPRPLARASRPS 
xxxxxxxxxxxxxx 



GAVSEDSILSSESFDQLDLPERLPEPPLRGCVSVDNLTGLEEPPSEGPGSCLRRWRQDPL 
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G DSC FS LTDCQE VT AT YRQALRVC SKLT 



Prosite for DKFZphtes3_7j3.2 



PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



121- >125 
576->580 
290->294 
337->341 
413->417 

30->33 
74->77 
82->85 

122- >125 
142->145 
148->151 
289->292 
327->330 
339->342 
373->376 
377->380 
616->619 

15->19 
133->137 
148->152 
227->231 
293->297 
331->335 
377->381 
391->395 



asn_glycosylation 

asn_glycosylation 

camp_phos pho_s ite 

camp_phos pho_s ite 

camp_phos pho_s ite 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho site 

pkc phospho~site 

pfwTphospho_site 

pkc_phos pho_s i te 

pkc_phospho_site 

pkc phospho_site 

pkc~phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

ck2 phospho_site 

ck2~phospho_site 

ck2_phospho_site 

ck2_phospho_site 

ck2_phospho_site 

ck2_phospho_site 

ck2_phospho site 

ck2 phospho~site 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 4 61->4 65 CK2_PHOSPHO_SITE PDOC00006 

PS00006 511->515 CK2_PHOSPHO_SITE PDOC00006 

PS00006 523->527 CK2_PHOSPHO SITE PDOC00006 

PS00006 57B->582 CK2_PHOSPHO~SITE PE)OC00006 

PS00006 606->610 CK2_PHOSPHO_SITE PDOC00006 

PS00007 453->460 TYR PHOSPHO_SITE PDOC00007 

PS00007 453->461 TYR~PHOSPHO_SITE PDOC00007 

PSO0008 320->326 MYRISTYL PDOC00008 

PS00008 324->330 MYRISTYL PDOC00008 

PS00008 347->353 MYRISTYL PDOC00008 

PS00008 360->366 MYRISTYL PDOC00008 

PS000I6 134->137 RGD PDOC00016 

PS00107 59->82 PROTE I N_K I NASE_AT P PDOC00100 

PS00107 59->86 P ROT E I N_K I N AS E_AT P PDOC00100 

PS00108 171->184 PROTEIN KINASE ST PDOC001O0 



Pfam for DKFZphtes3_7j3.2 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



53 



♦YeigRilGeGsFGtVYkCiWrTGelVAIKIIkkrsma FIRE I 

YE+++++G+G++G+V+K+++ +G++VAIK I+K++++ ++REI 
YEFLETLGKGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREI 



qlMRrLnHPNIIRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEw 
+IM +LNHP+II + ++FE ++ I ++MEY+ GDL+DYI+++ ++SE+ 
102 EIMSSLNHPHI IAIHEVFE-NSSKIVIVMEYASRGDLYDYISERQQLSER 

elrf IMyQILrGMeYLHSMgllHRDLKPENILIDeNgqIKIcDFGLARqM 
E+R++++QI++++ Y+H ++++HRDLK ENIL+D NG+IKI+DFGL+ ++ 
151 EARHFFRQIVSAVHYCHQNRWHRDLKLENILLDANGNIKIADFGLSNLY 

nnYerMttfCGTPWYMMAPEVIImg. nyYttkVDMWSFGCILWEMMTGep 
+ + ++ TFCG+P Y +PE+ ++G +Y +++VD WS+G++L++++ G+ 
201 HQGKFLQTFCGSPLYA-SPEI-VNGKPYTGPEVDSWSLGVLLYILVHGTM 

PFyddnMemlmrliqrfrrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF+++ ++ I + +++ +P S+ + ++RW++ ++P++R T +++ 
249 PFDGHDHKILVKQISNGAYREPPKPSD-ACGLIRWLLMVNPTRRATLEDV 



LnHPWF* 
H W+ 
296 ASHWWV 



101 



150 



200 



248 



297 



303 
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DKFZphtes3_7j8 



group: testes derived 

DKFZphtes3_7j8 encodes a novel 410 amino acid protein nearly identical to human 
WUGSC : H_DJ1 1 5 9O0 4 . 1 . 

The novel protein contains an additional C-terminal domain, which is not present in 
WUGSC : H_DJ1 1 5 9O0 4 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

WUGSC :H_DJ1 15 900 4.1 similarity to YBL104p 

verifies and extends the genmodel WUGSC : H_DJ1 159004 . 1 
similarity to S.cerevisiae YBL104p 

Sequenced by BMFZ 

Locus : /map= " 7p2 1 -p2 2 " 

Insert length: 3353 bp 

Poly A stretch at pos. 3231, no polyadenylation signal found 

1 GCAAAATATG TTGTATTTGT GGCATAGTTC ATATTTACAC TATCATAAAA 
51 TTATGGCCGA GAAGTTAAAT ATTCTAAATG TGTCAACATA GTTCTCTGTA 

101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 

151 TTTTCTGTTT TTAGGAATGG TGGAAAGCAG CAGACATAAT TGGAGTGGGT 

201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 

251 TTACAGCTTT GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 

301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 

351 TGGCATTGTT CAACTTGGAT ATTCGCCGAG CAATCCAAAT CCTGAATGAA 

401 GGGGCATCTT CTGAAAAAGG AGATCTGAAT CTCAATGTGG TAGCAATGGC 

451 TTTATCGGGT TATACGGATG AGAAGAACTC CCTTTGGAGA GAAATGTGTA 

501 GCACACTG£G ATTACAGCTA AATAACCCGT ATTTGTGTGT CATGTTTGCA 

551 TTTCTGACAA GTGAAACAGG ATCTTACGAT GGAGTTTTGT ATGAAAACAA 

601 AGTTGCAGTA CGTGACAGAG TGGCATTTGC TTGTAAATTC CTTAGTGATA 

651 CTCAGTTAAA TAGATACATC GAAAAGTTGA CCAATGAAAT GAAAGAGGCT 

701 GGAAATTTGG AAGGAATTTT GCTTACAGGC CTTACTAAAG ATGGAGTGGA 

751 CTTAATGGAG AGTTATGTTG ATAGAACTGG AGATGTTCAA ACAGCAAGTT 

801 ACTGTATGTT ACAGGGTTCA CCTTTAGATG TTCTTAAAGA TGAAAGGGTT 

851 CAGTACTGGA TTGAGAATTA TAGAAATTTA TTAGATGCCT GGAGGTTTTG 

901 GCATAAACGA GCTGAATTTG ATATTCACAG GAGTAAGTTG GATCCCAGTT 

951 CCAAGCCTTT AGCACAAGTT TTTGTGAGTT GCAATTTCTG TGGCAAGTCA 
1001 ATCTCCTACA GCTGTTCAGC TGTGCCTCAT CAGGGCAGAG GTTTTAGTCA 
1051 GTATGGTGTG AGTGGCTCAC CAACGAAATC TAAAGTCACA AGTTGTCCTG 
1101 GCTGTCGAAA ACCACTTCCT CGATGTGCGC TTTGTCTCAT TAATATGGGA 
1151 ACACCAGTTT CTAGCTGTCC TGGAGGAACC AAATCAGATG AAAAAGTGGA 
1201 CTTGAGCAAG GACAAAAAAT TAGCCCAATT TAACAACTGG TTTACATGGT 
1251 GTCATAATTG CAGGCACGGT GGACATGCTG GACATATGCT TAGTTGGTTC 
1301 AGGGACCATG CAGAGTGCCC TGTGTCTGCA TGCACGTGTA AATGTATGCA 
1351 GTTGGATACA ACGGGGAATC TGGTACCTGC AGAGACTGTC CAGCCATAAA 
1401 ATGTTACCAC CTTAAGAGAA CCCTTCAAGT GTGGAGCTTT CTAGTAGGTG 
14 51 TCCTTCATAG CTCAGAAACA TACCTCAGAA CAAGCCATTC ATGACTTACC 
1501 TGTAATGGGA AAATAAATCA TTCTATCAGA TCAGCAGTTT TGATGTTTGA 
1551 GTGATTTTGA TATGCTTCAC AGAGACAAAT GCTGCCAAAA TAAACATCGA 
1601 AGTATAGACA TGAGTTCTGT TCAGCAGGTT GAAAAGTCTG ATTTAGAAAA 
1651 ACTTTCTAAG TTTTGGTTGA AATTATGAAC ACTCTAGAAG CAGAATTTCT 
1701 GGAAGAGCCA AGAACAGACT TTGAGCCTAT ATCTTCAAAG CTGAAACTGG 
1751 ATATCTTTCA ATAAAATATG TGCACTTTTA AAATAAAATG ACTAATTCTG 
1801 TGATTCAGAC AATAGTTTTA AGTTCAGCTG TGCTTAGATT TCTTTCAGAT 
1851 TAATTTAAAA TTATAGATTT TTACTTTTAG AATTGCAGAG CCCCTATCCC 
1901 ACACTGGAGA ATATTTTTTA TTACTGTCTG TTATATATGT GTCTATGTGT 
1951 GTGTGTATAT TTATGTGTGT ATGTATAAAT ATGTACTTTT TAAAGGAGCC 
2001 TTTTCCCTCC TTTGATTTTA AGATAAGCAA TCTTTTGGCA TAACATTATC 
2051 GTCTTCCTAG AAAAGCCAAG ATGAAGAATC TATCTTACAA CTTTTTCTCT 
2101 TCAGTAGAGA AAAACATGTA CCATTTCAGG TGAACATACA AAATTTTCAC 
2151 TTTCTACCTT TTGCCTTCCA ATGTCCTGAT TTGTCTTCAA AGGTTTTTCT 
2201 CCATATTAAT TTGTCATCTT ATCCTCATCA CCTGAGAACA TTTTACTGCA 
2251 TACAAAGTCT ATGCAAGATT ATATGTAACT AGCCATTTAG TATAATCTAT 
2301 GTCAGTGTTT CTGTGCTGTC AAATTCCGTC CTGATTTGGA AT AC C AT AC C 
2351 TTGTTCTTTC CAAGGTAGAC TAGGAAGTGT TGGGGAAATA GGGTCACTTC 
2401 AGAGACCATT TTAGATGTAA GTTTTTAAAT GTAAGTGTTA CTGGGGCTAA 
2451 GTCAGGGACT TTATTTAAAA CATTTTTTTT TTCTCATTTC ATAGCTAGAT 
2501 AGTTGTAAGA GAAATACAAA GAATTTACAA GATGCTTCTC TGTCATCTGC 
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2551 CGTATGCAGA GGGACTGAAC 

2601 AAAGAGTAAA TCTTATTTTA 

2651 AAGAGCTTTC GTATTAGCAG 

2701 ATTAGTTTGA GGTGTAACCT 

2751 TACCTTGAGT GTCTGATACA 

2B01 TAGTACATAT TTACTCTAAA 

2851 GAAAGACATG GTAATTGCAA 

2901 TTTTTCCAGC CTTCATTTGA 

2951 ATATACCCTT TACCTTTAAT 

3001 CTGTCTTAAA TATGAAAGTC 

3051 TTTCATTCTC ATTAGCTAAA 

3101 AAGTTTTGGA AATACAGTAT 

3151 ATGCTTATTT GTAATCCTAA 

3201 GTATGTGTCA ACCTCTTAAA 

3251 AAAAAAAAAA AAAAAAAAAA 

3301 AAAAAAAAAA AAAAAAAAAA 

3351 AAA 



TAGGAATTTT GTAGTTGAAG CTGTGTTCAT 
TAGATTTTGG AGAAATAAAA CAAGAATTTT 
TTTTGCCTTA TAAAAACTAA' GATTTGTCAG 
AAATATTAAA AGTAGATTAA ATTTATTTTT 
TAAAACCCTT TTCTAGGAAA ACATTGGAAG 
TGTCTCACCT GCATGACAGT CTTTTCAAAT 
TTTTTTTTTA AAGATTGCTA TTAAGGGTAC 
GTAAATCTTA ATTGATTTCA TTTTATTAAC 
ATTTCATTTG AAGTGTTCCT TTCAAACTTA 
AGCTTTAAGT AATGTCAGAC TCATATGCAT 
GTAAAATGTA AAATTATCTC AAATAGTTAC 
AAAACATGAA TGTAAAGTCT ATTATGTAAT 
TATATGAGGG TGACATTTTT AAGATTGTAT 
TGTTTTCTGT GAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 167 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 



1 MVESSRHNWS GLDKQSDIQN LNEERILALQ LCGWIKKGTD VDVGPFLNSL 

51 VQEGEWERAA AVALFNLDIR RAIQILNEGA SSEKGDLNLN WAMALSGYT 

101 DEKNSLWREM CSTLRLQLNN PYLCVMFAFL TSETGSYDGV LYENKVAVRD 

151 RVAFACKFLS DTQLNRYIEK LTNEMKEAGN LEGILLTGLT KDGVDLMESY 

201 VDRTGDVQTA SYCMLQGSPL DVLKDERVQY WIENYRNLLD AWRFWHKRAE 

251 FDIHRSKLDP SSKPLAQVFV SCNFCGKSIS YSCSAVPHQG RGFSQYGVSG 

301 SPTKSKVTSC PGCRKPLPRC ALCLINMGTP VSSCPGGTKS DEKVDLSKDK 

351 KLAQFNNWFT WCHNCRHGGH AGHMLSWFRD HAECPVSACT CKCMQLDTTG 

401 NLVPAETVQP 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_7j8, frame 2 

PIR:S45391 probable membrane protein YBL104c - yeast (Saccharomyces 
cerevisiae) , N = 2, Score « 446, P - 4.5e-47 

TREMBL:AC004982_1 gene: "WUGSC : H_DJ1 15 9O04 . 1"; Homo sapiens PAC clone 

DJ1159O04 from 7p21-p22, complete sequence., N - 1, Score = 2038, P = 
7.6e-211 



>TREMBL:AC004982_1 gene: "WUGSC :H_DJ1 159004 . 1"; Homo sapiens PAC clone 
DJ1159O04 from 7p21-p22, complete sequence. 
Length - 379 



HSPs: 



Score = 2038 (305.8 bits), Expect - 7.6e-211, P = 7.6e-211 
Identities = 379/379 (100%), Positives - 379/379 (100%) 



Query: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 

Sbjct: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

Query: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 

AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 

Sbjct: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 
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Query: 121 PYLCVNFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
Sbjct: 121 PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

Query: 181 LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 

LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 
Sbjct: 181 LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 240 

Query: 241 AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKS I S YSCSAVPHQGRGFSQYGVSG 300 

AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
Sbjct: 241 AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKS IS YSCSAVPHQGRGFSQYGVSG 300 

Query: 301 SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 

SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
Sbjct: 301 SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 

Query: 361 WCHNCRHGGHAGHMLSWFR 379 

WCHNCRHGGHAGHMLSWFR 
Sbjct: 361 WCHNCRHGGHAGHMLSWFR 379 



Pedant information for DKFZphtes3_7j8, frame 2 



Report for DKFZphtes3_7 j8 . 2 



[LENGTH] 410 

[MW] 45862.45 

[plj 6.51 

[HOMOL] TREMBL:AC004982_1 gene: "WUGSC :H_DJlX59O04 . 1"; Homo sapiens PAC clone DJ1 159004 
from 7p21-p22, complete sequence. 0.0 

[ FUNCAT ] 99 unclassified proteins [S. cerevisiae, YBL104cJ 7e-48 

[BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

[BLOCKS] BL00534A Ferrochelatase proteins 

[PIRKW] transmembrane protein 2e-4 6 

[KW] All_Alpha 



SEQ 
PRO 



MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 
cccccccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccccchhhhh 



SEQ 
PRD 



AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 
hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhccc 



SEQ 
PRD 



PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
ccccceeeccccccccccceeeccchhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcc 



SEQ 
PRD 



LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 
cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhhhhhhhhhhhhhhhh 



SEQ 
PRD 



AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSIS YSCSAVPHQGRGFSQYGVSG 
hhhhhhhhhhhhhhcccccccccceeeeeeeccccccccccccccccccccccccccccc 



SEQ 
PRD 



SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
ccccccccccccccccccceeeeecccccccccccccccccceeeehhhhhhhhhcceee 



SEQ 
PRD 



WC H NC RHGGH AGHML S W F RDH AEC P V S AC TCKCMQL DTTGN L V P AET VQP 
eecccccccccchhhhhhhhhccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphtes3_7j8 .2) 



(No Pfam data available for DKFZphtes3_7j8 .2) 
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DKF2phtes3_7plO 



group: Cell Cycle 

DKFZphtes3_7pl0. 1 encodes a novel 422 amino acid putative protein, which is closely related to 
the Xenopus laevis XPMC2 protein. 

In fission yeast the kinases Weel and Mikl control that initiation of mitosis starts after 
completion of DNA synthesis. Yeast in which both Weel and Mikl kinases are defective exhibit a 
mitotic catastrophe phenotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Weel/Mikl kinase function. The XPMC2 protein is localised in 
the nucleus in Xenopus oocytes. The new protein is the human orthologue of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to XPMC2 protein 
complete cDMA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map»"9q34" 
Insert length: 2380 bp 

Poly A stretch at pos. 2341, polyadenylation signal at pos. 2318 



1 AGCGTGCGTG CTGAGGTATG CGCAACGCGT GCGGGGTCTC TTCCGGAGTC 
51 TTTTCCTGGA CGGGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 
101 AGGCGCTTGT GCTGCCAGGG CGCCGGGCCC GGGGAGGCCG GGGTCTCGGG 
151 TGGCCGCCGG CCCAGGCGCT GGACGGCAGC AGGATGGGGA AGGCGAAGGT 
201 CCCCGCCTCC AAGCGCGCCC CGAGCAGCCC CGTGGCTAAG CCGGGTCCTG 
251 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 
301 AGCAAGGCGC GGGAAGTAAG CAAGAAGCCA GCAAGCGGCC CCGGTGCTGT 
351 GGTGCGACCT CCAAAGGCAC CAGAAGACTT TTCTCAAAAC TGGAAGGCGC 
401 TGCAAGAGTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAAGCCTCTT 
451 GTCATCTCTC AGATGGGTTC CAAAAAGAAG CCCAAAATTA TCCAGCAAAA 
501 CAAAAAAGAG ACCTCGCCTC AAGTGAAGGG AGAGGAGATG CCGGCAGGAA 
551 AAGACCAGGA GGCCAGCAGG GGCTCTGTTC CTTCAGGTTC CAAGATGGAC 
601 AGGAGGGCGC CAGTACCTCG CACCAAGGCC AGTGGAACAG AGCACAATAA 
651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTGTTCCA GAACGAGGGG 
701 ACATCGAGCA TAAGAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 
751 ACCGAGGAAG ACATCTGGTT TGACGACGTG GACCCAGCGG ATATCGAAGC 
801 TGCCATAGGT CCAGAGGCGG CCAAGATAGC GAGGAAACAG TTGGGTCAGA 
851 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCGGCGGC 
901 CTGACAAGAG CCTTAGCCTT GGACTGTGAG ATGGTGGGCG TGGGCCCTAA 
951 GGGGGAGGAG AGCATGGCCG CCCGTGTGTC CATCGTGAAC CAGTATGGGA 
1001 AGTGCGTTTA TGACAAGTAC GTCAAACCAA CTGAGCCCGT GACGGACTAT 
1051 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 
1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA GATGCTGAAG GGCAGAATTC 
1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 
1201 CCAAAAAAGA AGATTCGGGA C AC AC AG AAA TATAAACCTT TCAAGAGTCA 
1251 AGTAAAGAGT GGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 
1301 GGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 
1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG GAGTGGGAGA GCATGGCCCG 
1401 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 
1451 AGCAGTCCTG CCCTGCTGCT GCTGCCGCCC CGCTACAGAG GCAATGTGAC 
1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 
1551 CTTTTCAGAA TCATGGCAGA GGGGCGTGGC GTGGTGCTAC TGAGAAGGTC 
1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 
1651 TGTTTGGGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 
1701 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TGAGGTTGGG 
1751 GCCTGTGGGC CGCCAGTCCA TACGGTGCTG TCACTGCCCA TCTTCGGTGA 
1801 CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 
1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCTT GAGGGACCCA 
1901 GACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 
1951 GTGTTACATC AGGGGTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 
2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 
2051 CCCCATAGCA CGAGGATGTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 
2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 
2151 TGTGATGCTA GCCGGTGGCT TTCTGGGCTA GGCCCCAGTT TGAGGCTCCC 
2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAG GGGACGGAGT 
2251 CCAAGGCGTT ATTGGGCCAC CTGACAGCTG GACAGAAAAG GGGCAGACAC 
2301 ACCGAGGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HSAC2099 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Genomic sequence from Human 9q34; HTGS 
phase 1, 2 unordered pieces. 

Score = 5055, P => 0.0e+00, identities - 1011/1011 
8 exons Bp 104219-116190 



Medline entries 



95157530: 

Cloning and expression of a Xenopus gene that prevents mitotic 
catastrophe in fission yeast. 



Peptide information for frame 1 



ORF from 184 bp to 1449 bp; peptide length; 422 
Category: strong similarity to known protein 



1 MGKAKVPASK RAPSSPVAKP GPVKTLTRKK NKKKKRFWKS KAREVSKKPA 
51 SGPGAVVRPP KAPEDFSQNW KALQEWLLKQ KSQAPEKPLV ISQMGSKKKP 
101 KIIQQNKKET SPQVKGEEMP AGKDQEASRG SVPSGSKMDR RAPVPRTKAS 
151 GTEHNKKGTK ERTNGDIVPE RGDIEHKKRK AKEAAPAPPT EEDIWFDDVD 
201 PADIEAAIGP EAAKIARKQL GQSEGSVSLS LVKEQAFGGL TRALALDCEM 
251 VGVGPKGEES MAARVSIVNQ YGKCVYDKYV KPTEPVTDYR TAVSGIRPEN 
301 LKQGEELEW QKEVAEMLKG RILVGHALHN DLKVLFLDHP KKKIRDTQKY 
351 KPFKSQVKSG RPSLRLLSEK ILGLQVQQAE HCSIQDAQAA MRLYVMVKKE 
401 WESMARDRRP LLTAPDHCSD DA 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7plO, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7plO, frame 1 



Report for DKFZphtes3_7plO . 1 



(LENGTH) 
IMW] 
tpU 
[HOMOL] 
[FUNCAT] 
( FUNCAT ] 
[FUNCAT] 
YGL094C] 7e 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
(PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
tKW] 
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46671.91 
9.79 

PIR:S53818 XPMC2 protein - African clawed frog 7e-96 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c) 2e-42 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 2e-19 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 



-13 

04.05.05 mrna processing 
YGL094c] 7e-l3 

99 unclassified proteins 
RGD 1 
MYRISTYL 4 
CAMP_PHOSPHO_SITE 2 
CK2_PHOSPHO SITE 6 
TYR_PHOSPHO~SITE 2 
GLYCOS AMI NOGLYC AN 1 
PKC_PHOSPHO_SITE 8 
All_Alpha 

LOW COMPLEXITY 11.37 % 



(5* -end, 3' -end processing and mrna degradation) [S. 
[S. cerevisiae, YLRl07w] 6e-10 



SEQ MGKAKVPASKRAPSSPVAKPGPVKTLTRKKNKKKKRFWKSKAREVSKKPASGPGAVVRPP 

SEG xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ KAPEDFSQNWKALQEWLLKQKSQAPEKPLVISQMGSKKKPKIIQQNKKETSPQVKGEEMP 

SEG xxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeecccccccccccccee 
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SEQ AGK DQE AS RG S V PSG S KM DRRA P V P RT KASGT EHN K KGT K E RTNG D I V P ERG D I EHKKRK 

SEG xxxxxx 

PRD ecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ AKEAAPAPPTEEDIWFDDVDPADIEAAIGPEAAKIARKQLGQSEGSVSLSLVKEQAFGGL 

SEG xxxxxxxxxxxx 

PRD hhhhcccccccceeeecccccchhhhhhccchhhhhhhhhhcccccchhhhhhhhhhhhh 

SEQ TRALALDCEMVGVGPKGEESMAARVSIVNQYGKCVYDKYVKPTEPVTDYRTAVSGIRPEN 

SEG 

PRD hhhcccccccccccccchhhhhhhhhccccccceeeeeeecccccccccccccccccccc 

SEQ LKQGEELEVVQKEVAEMLKGRILVGHALHNDLKVLFLDHPKKKIRDTQKYKPFKSQVKSG 

SEG 

PRD ccccchhhhhhhhhhhhhhcceeeeccchhhhhhhhhcccccccccceeecccccccccc 

SEQ RPSLRLLSEKILGLQVQQAEHCSIQDAQAAMRLYVMVKKEWESMARDRRPLLTAPDHCSD 

SEG 

PRD chhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

SEQ DA 
SEG 

PRD CC 



Prosite for DKFZphtes3_7plO. 1 



PS00002 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 



51->55 
107->111 
156->160 
9->12 
27->30 
46->49 
96->99 
347->350 
359->362 
363->366 
368->371 
136->140 

150- >154 
163->167 
190->194 
383->387 
413->417 
343->351 
342->351 
130->136 

151- >157 
221->227 
239->245 
171->174 



GLYCOSAMINOGLYCAN 

CAMP_PHOS PHO_S I TE 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 



PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC000O7 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 



(No Pfam data available for DKFZphtes3_7pl0.1) 
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DKFZphtes3_7p9 



group: nucleic acid management 

DKFZphtes3_7p9 encodes a novel 691 amino acid protein with similarity to human nuclear domain 
10 protein NDP52. 

The nuclear domain (ND)10 also described as POD or Kr bodies is involved in the development of 
acute promyelocytic leukemia and virus-host interactions. The NDP52 protein is part of this 
complex structure. In vivo, NDP52 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. ND10 plays an important role in the viral life 
cycle. 

The novel protein is similar to NDP52 . It contains three leucine zippers and a RGD cell 
attachment site. This protein seems to be a novel part of the ND819) complex. 

The new protein can find application in modulation of viral infections and tumour events. 



similarity to nuclear domain 10 protein NDP52 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="329.1 cR from top of Chrl2 linkage group" 
Insert length: 3003 bp 

Poly A stretch at pos. 2957, no polyadenylation signal found 



1 AAGGTGAGGG GAACAGCTGA 
51 GCCAGGATGG AAGAATCACC 
101 CAACTTTCTC AATGTAGCCC 
151 GTCACTACAC CCTTCCCCCA 
201 GGCATCTTCA AGGTGGAGGC 
251 GTGGTCTTCC GTGCCTGAAA 
301 GTGTCCAGTT CCAAGCCAGC 
351 CAGTTCCGAT ATGTGAACCG 
4 01 TTTCCAGTTC CGAGAGCCAA 
451 AGGCTGATGG GGGCTCTGAC 
501 TTACAGAACC AGCTCGATGA 
551 GCTGAAGCTA CAGCTGGAGG 
601 AGGAGCTCGA GAGGGCTCTG 
651 ATGGAACAGT ACAAGGGGAT 
701 GAGGGACATC CTGAGCCGGC 
751 AGCTAGAGGA TGACATCCAG 
801 GTGGAGCTGG ACAGGCTTAG 
851 AGAGAAGCTC CTTGGGCAAC 
901 GTGAGGCTGA GCTCCAAGTG 
951 GACCTGAAGG AGGCGAAGAG 
1001 GCGACTGAAA GACAAGGTGG 
1051 AGCAGCGGGT GGCCGAGCTG 
1101 CAGGAGCTTG CAGCCTCAAG 
1151 GTTGGCCAGC GCAGCAGCAG 
1201 GCAGCCGCCT GGAAGTGGCT 
1251 -TTGCACTTGA AGGAAGAAAA 
1301 GCTGCAGAGT GTGGAGGCAG 
1351 AGATACTTCG ATTGGAGAAG 
1401 GTGTTCAAGA CTGAGCTGGC 
1451 GTCAGAAAGT AAGCGGGAGC 
1501 TCCAGAAGGA AAAGGAGCAG 
1551 TACATGAGAA AGCTAGAGGC 
1601 GAATGAGGAT GCCACCACAG 
1651 GCCCGGCAGC TCTGACAGAC 
1701 CTCCCACCCT ATGGCCTTTG 
1751 TGGGCCTCGA GAGGCTTCTC 
1801 TTTCTCCTCA CCTCTCTGGG 
1851 GCTGAAGATG AGAAGTCAGT 
1901 GGAGGCCAAC TTACTGCTTC 
1951 CCAGTGGCTT TACAGTGGGT 
2001 GCCACCCCCA CATGGAAGGA 
2051 TGAGAGTGAC AAGGATGCCC 
2101 TCAGCACCCA GGACCCCTTC 
2151 GCACAAATAC ACACTCATGC 
2201 AGGTTTCATG CCCATTTTCT 
2251 CTAAGAACTG CTTCTGTGTG 
2301 ATCCTCTCCT ACCTGGCTCT 
2351 CAGTGGCTGA ATTTATCCCC 
2401 GGAGGCCTTC CCCTGTGGGA 



TCCGTCTGTT GGGAGGACAG ATATCTCAAG 
ACTAAGCCGG GCACCATCCC GTGGTGGAGT 
GGACCTACAT CCCCAACACC AAGGTGGAAT 
GGCACCATGC CCAGTGCCAG TGACTGGATT 
TGCCTGTGTT CGGGATTACC ACACATTTGT 
GTACAACTGA TGGTTCCCCC ATTCACACCA 
TACCTGCCCA AACCAGGAGC TCAGCTCTAC 
CCAGGGCCAG GTGTGTGGGC AGAGCCCCCC 
GGCCCATGGA TGAACTGGTG ACCCTGGAGG 
ATCCTGCTGG TTGTCCCCAA GGCAACTGTG 
GAGCCAGCAA GAACGGAATG ACCTGATGCA 
GACAGGTGAC AGAGCTGAGG AGCCGAGTGC 
GCAACTGCCA GGCAGGAGCA CACGGAGCTG 
TTCCCGGTCC CATGGGGAGA TCACAGAAGA 
AACAGGGAGA CCATGTGGCA CGCATCCTGG 
ACCATCAGTG AGAAAGTGCT GACGAAGGAA 
AGACACAGTG AAGGCCCTGA CTCGGGAACA 
TGAAAGAAGT ACAAGCAGAC AAGGAGCAAA 
GCACAACAGG AGAACCATCA CTTAAATTTG 
CTGGCAAGAG GAGCAGAGTG CTCAGGCTCA 
CCCAGATGAA GGACACCCTA GGCCAGGCCC 
GAGCCCTTGA AGGAGCAGCT TCGAGGGGCC 
CCAGCAGAAA GCCACCCTTC TTGGGGAGGA 
CCAGGGACCG CACCATAGCC GAACTACACC 
GAAGTTAACG GCAGGCTGGC TGAGCTCGGT 
ATGCCAATGG AGCAAGGAGC GGGCAGGGCT 
AGAAGGACAA GATCCTGAAG CTGAGTGCAG 
GCAGTTCAGG AGGAGAGGAC CCAAAACCAA 
CCGGGAGAAG GATTCTAGCC TGGTACAGTT 
TGACAGAGCT GCGGTCAGCC CTGCGTGTGC 
TTACAGGAGG AGAAACAGGA ATTGCTAGAG 
CCGCCTGGAG AAGGTGGCAG ATGAGAAGTG 
AGGATGAGGA GGCCGCTGTG GGGCTGAGCT 
TCAGAGGACG AGTCCCCAGA AGACATGAGG 
TGAGCGTGGA GACCCAGGCT CCTCTCCTGC 
CCCTTGTTGT CATCAGCCAG CCGGCTCCCA 
CCAGCTGAGG ACAGTAGCTC TGACTCGGAG 
CCTGATGGCA GCTGTGCAGA GTGGGGGTGA 
CTGAACTGGG CAGTGCCTTC TATGACATGG 
ACCCTGTCAG AAACCAGCAC TGGGGGCCCT 
GTGTCCTATC TGTAAGGAGC GCTTTCCTGC 
TGGAGGACCA CATGGATGGA CACTTCTTTT 
ACCTTTGAGT GATCTTACTC CCTCGTACAT 
ACACACACAC TCACACACAT GCATACACTT 
ATCACACTGG GCTCCATGAT ATTCTGTTCC 
CCCTGTTTTC ATCCCAAGAT TTCTCACTTC 
TTTGTCCCAG GGAGGGGTCC TGTTCGGAAG 
TGAAAGTGGT TTTGGAGGAA CCGGGATGGA 
ATAGAATCGT CCACTCCTAG CCCTGGTTGC 
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24 51 TTCTGATACA CAGCCACTGC ACACACACAC TCACACTCAC ACTCCCTTGT 
2501 CTGATGCCCC AAAGCCAATT CCTGGGGCAC CCTACCCTCT CTTATTTGGA 
2551 GTTTCCGTTG GTTTACCTGA GTTTTCTCTG GGGTCTGCAC AGAGGCAGCA 
2601 GCATGGACAT CATGGCCTCT CAGGTCCCTT TTGGTTCTCA GTTTCATTGG 
2651 TTCCTCTTTC TGTTCCCCCA TTGACTTCTG TGCCCCACCC TAGCCTTTTC 
2701 CATAACCTTA GGTATTCAGT TTGGAGGGGT TTTTTGTATT TTTGAGGATT 
2751 CCTGTATTCT GTATCCTCTC CTCGCATCTC CTCACATGGA AAGAAATAAT 
2801 GTATTTGTGC CTTCTGTGAG GAATGGGGGG AACAAGTGGT CCCAGGTATC 
2851 CCCATTTCCA AGGCCCCCCT CCCTCTCCAG GTCCCCCCAC AGCAATAAAA 
2901 GCTTCCCCCT GATATCCATC CCTTTGTAGT TTGAACAAAT ATATTTATAT 
2951 GATATGTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3001 AAA 



BLAST Results 



Entry HS189353 from database EMBL: 
human STS WI-11261. 
Score - 2191, P «* 1.4e-92, identities - 463/485 



Medline entries 



95310349: 

Molecular characterization of NDP52, a novel protein of the 
nuclear domain 10, which is redistributed upon virus 
infection and interferon treatment. 

97375672: 

Cellular localization, expression, and structure of the nuclear 
dot protein 52. 



Peptide information for frame 3 



ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to known protein 
Prosite motifs: RGD (557-560) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (475-497) 
LEUCINE_ZIPPER (482-504) 



1 MEESPLSRAP SRGGVNFLNV ARTYIPNTKV ECHYTLPPGT MPSASDWIGI 
51 FKVEAACVRD YHTFVWSSVP ESTTDGSPIH TSVQFQASYL PKPGAQLYQF 
101 RYVNRQGQVC GQSPPFQFRE PRPMDELVTL EEADGGSDIL LVVPKATVLQ 
151 NQLDESQQER NDLMQLKLQL EGQVTELRSR VQELERALAT ARQEHTELME 
201 QYKGISRSHG EITEERDILS RQQGDHVARI LELEDDIQTI SEKVLTKEVE 
251 LDRLRDTVKA LTREQEKLLG QLKEVQADKE QSEAELQVAQ QENHHLNLDL 
301 KEAKSWQEEQ SAQAQRLKDK VAQMKDTLGQ AQQRVAELEP LKEQLRGAQE 
351 LAASSQQKAT LLGEELASAA AARDRTIAEL HRSRLEVAEV NGRLAELGLH 
401 LKEEKCQWSK ERAGLLQSVE AEKDKILKLS AEILRLEKAV QEERTQNQVF 
451 KTELAREKDS SLVQLSESKR ELTELRSALR VLQKEKEQLQ EEKQELLEYM 
501 RKLEARLEKV ADEKWNEDAT TEDEEAAVGL SCPAALTDSE DESPEDMRLP 
551 PYGLCERGDP GSSPAGPREA SPLVVISQPA PISPHLSGPA EDSSSDSEAE 
601 DEKSVLMAAV QSGGEEANLL LPELGSAFYD MASGFTVGTL SETSTGGPAT 
651 PTWKECPICK ERFPAESDKD ALEDHMDGHF FFSTQDPFTF E 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3J7p9, frame 3 

PIR:A56733 nuclear domain 10 protein NDP52 - human, N = 2, Score « 307, 
P - 7.7e-28 

TREMBL:AB008852_1 gene: "NDP"; product: "NDP52"; Bos taurus mRNA for 
NDP52, complete cds., N ** 2, Score = 302, P = 4e-27 

TREMBL:AC00454 9_1 gene: "WUGSC: H_RG459N13 . 1"; product: "TXBP151"; Homo 
sapiens BAC clone RG459N13 from 7pl5, complete sequence., N = 2, Score 
= 275, P - 2.3e-25 
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PIR:G02043 TXBP151 - human, N ■= 2, Score = 270, P - 8.5e-25 

TREMBL: DM3581 6_4 gene: "zip"; product: "nonmuscle myosin-II heavy 
chain"; Drosophila melanogaster nonmuscle myosin-II heavy chain (zip) 
gene, complete cds., N = 1, Score - 254, P = 1.4e-17 



>PIR:A56733 nuclear domain 10 protein NDP52 - human 
Length = 446 

HSPs: 



Score ■ 307 (46.1 bits), Expect - 7.7e-28, Sum P(2) =* 7.7e-28 
Identities « 104/323 (32%), Positives - 158/323 (48%) 



Query: 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 


74 






V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 




Sbjct: 


23 


VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 


ao 

0£ 


Query: 


75 


DG S P I HT S VQFQ AS YL PK PGAQL YQFRY VN RQGQV CGQS P P FQFRE P RPMD E L VT L EE AD 


134 






+ S VQF+A YLPK + YQF YV+ G V G S PFQFR D LV + 




Sbjct: 


83 


NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASIPFQFRPENEEDILVVTTQ-- 


1 


Query: 


135 


GGSDILLVVPKATVLQNQ-LDES QQERNDLMQLKLQLEGQVTE-LRSRVQELERALA 


189 






G + + K +NQ L +S Q++N MQ +LQ + + E L+S ++LE + 




Sbjct: 


140 


GEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVK 


1 QQ 


Query: 


190 


TARQE-HTELMEQYKGISRSHGEITEERDI-LSRQQGDHVARILELEDDIQTISEKVLTK 


247 






+ TEL+ QK++ E+I+ + Q + E+E +Q +K T+ 




Sbjct: 


200 


EQKDYWETELL-QLKEQNQKMSSENEKKGIRVDQLQAQLSTQEKEMEKLVQGDQDK — TE 


256 


Query: 


248 


EVE-LDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSW 


306 






++E L + D + EQ K +L++ +Q+E QQE N DL + S 




Sbjct: 


257 


QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQNETTAMKKQQELMDENFDLSKRLSE 


JIO 


Query: 


307 


QEEQSAQAQRLKDKVAQMKDTLGQAQQRV 335 








E QR K+++ D L + R+ 




Sbjct: 


317 


NEIICNALQRQKERLEGENDLLKRENSRL 345 




Score 


- 304 


(45.6 bits), Expect = 2.1e-27, Sum P(2) - 2.1e-27 




Identities » 98/337 (29%), Positives = 163/337 (48%) 




Query: 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 


74 






V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 




Sbjct: 


23 


VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 


82 


Query: 


75 


DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 


134 






+ S VQF+A YLPK + YQF YV+ G V G S PFQFR P +E 




Sbjct: 


83 


NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASIPFQFR PENE 


130 


Query: 


135 


GGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQE 


194 






DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE 




Sbjct: 


131 


— EDILVVTT QGEVEEI EQHNKELCKENQELKDSC I SLQKQNSDMQAELQK-KQE 


182 


Query: 


195 


HTELMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDR 


253 






E ++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+ 




Sbjct: 


183 


ELETLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQ 


232 


Query: 


254 


LRDTVKALTREQEKLL—GQLKEVQAD— KEQSEAELQVAQQENHHLNLDLKEAKSWQE 


308 






L+ + +E EKL+ Q K Q + KE L + +Q L+ + Q 




Sbjct: 


233 


LQAQLSTQEKEMEKLVQGDQDKTEQLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQN 


292 


Query: 


309 


EQSA — QAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQEL 351 








E +A + Q L D+ + L + + L+ KE+L G +L 




Sbjct: 


293 


ETTAMKKQQELMDENFDLSKRLSENEIICNALQRQKERLEGENDL 337 




Score 


= 124 


(18.6 bits), Expect - 2.3e-06, Sum P(2) - 2.3e-06 




Identities « 53/227 (23%), Positives « 113/227 (49%) 




Query: 


138 


DILLWPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQEHTE 


197 






DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE E 




Sbjct: 


132 


DILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQEELE 


185 


Query: 


198 


LMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDRLRD 


256 






++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+L+ 




Sbjct: 


186 


TLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQLQA 


235 


Query: 


257 


TVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSWQEEQSAQAQR 


316 






+ +E EKL VQ D++++E +L+ ++EN HL L L E + Q++ ++ 




Sbjct: 


236 


QLSTQEKEMEKL VQGDQDKTE-QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQ 


288 
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Query: 317 LK-DKVAOMKDTLGQAQQRVAELEPLKEQLRGAQELA-ASSQQKATLLGE 364 

+K ++ MK + Q+ + E L ++L + + A +QK L GE 
Sbjct: 289 MKQNETTAMK KQQELMDENFDLSKRLSENEIICNALQRQKERLEGE 334 



Score = 103 (15.5 bits), Expect « 4.4e-04, Sura P(2> - 
Identities => 63/278 (22%), Positives = 123/278 (44%) 



4.4e-04 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



299 DLKEAKSWQEEQSAQAQRLKDKVAQMK DTLGQAQQRVAELEPLKEQLRGAQELAAS 354 

+++E + +E + Q LKD ++ D + Q++ ELE L + + EL 

141 EVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETL-QSINKKLELKVK 199 

355 SQQKATLLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAG 414 

Q+ EL + +E + + V ++ +L+ + E+ Q +++ 

200 EQKD— YWETELLQLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEM-EKLVQGDQDKTE 256 

415 LLQSVEAEKDKI-LKLSAEIL— RLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKR 470 

L+ ++ E D + L L+ + +LE+ V E+ QN+ T + ++++ SKR 

257 QLEQLKKENDHLFLSLTEQRKDQKKLEQTV-EQMKQNET — TAMKKQQELMDENFDLSKR 313 



471 ELTELRSALRVLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNE DATTEDEEAA 527 

L+E LQ++KE+L+ E +LL ++ +RL +N T DE A 

Sbjct: 314 -LSENEIICNALQRQKERLEGEN-DLL KRENSRLLSYMGLDFNSLPYQVPTSDEGGA 368 

Query: 528 VGLSCPAALTD-SEDESPEDMRLPPYGLCERGDPGSSPAGPREASPL 573 

GL+ + E SP + + +C+ D ++ PL 

Sbjct: 369 RQNPGLAYGNPYSGIQESSSPSPLSIKKCPICKADDICDHTLEQQQMQPL 418 

Score - 64 (9.6 bits), Expect = 7.7e-28, Sum P(2) « 7.7e-2B 
Identities « 13/29 (44%), Positives - 17/29 (58%) 

Query: 651 PTWKECPICKERFPAESDKDALEDHMDGH 679 

P CPIC + FPA ++K EDH+ H 
Sbjct: 417 PLCFNCPICDKI FPA-TEKQI FEDHVFCH 444 

Score = 64 (9.6 bits), Expect = 5.8e+00, Sum P(2) * 1.0e+00 
Identities = 26/90 (28%), Positives = 45/90 (50%) 

Query: 470 RE LTEL RS ALRVLQKEKEQLQEE KQELLEYMRKLEARLE-KVADEK — W 515 

+E EL+ + LQK+ +Q E KQE LE ++ + +LE KV ++K W 
Sbjct: 154 KENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVKEQKDYWETELLQLK 213 

Query: 516 — NEDATTEDEEAAVGLS-CPAALTDSEDE 542 

N+ ++E+E+ + + A L+ EE 
Sbjct: 214 EQNQKMSSENEKMGIRVDQLQAQLSTQEKE 243 

Score = 47 (7.1 bits), Expect = 4.6e-26, Sum P(2) = 4.6e-26 
Identities » 11/30 (36%), Positives = 17/30 (56%) 

Query: 631 MASGFTVGTLSETSTGGPATPTWKECPICK 660 

+A G + E+S+ P + K+CPICK 

Sbjct: 374 LAYGNPYSGIQESSSPSPLSI— KKCPICK 401 



Pedant information for DKFZphtes3_7p9, frame 3 
Report for DKFZphtes3_7p9 . 3 



[LENGTH J 691 

[MW] 77336.52 

[pi] 4.77 

[HOMOLJ PIR:A56733 nuclear domain 10 protein NDP52 - human 2e-29 

[ FUN CAT J 09.10 nuclear biogenesis [S. cerevisiae, YDR356w] 2e-ll 

[ FUN CAT ] 30.04 organization of cytoskeleton (S. cerevisiae, YDR356w] 2e-ll 

[FUNCATJ 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

2e-ll 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] 2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058wj 2e-ll 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR309c] 2e-08 

[FUNCAT] 03.04 budding, cell polarity and filament formation (S. cerevisiae, YHR023w 
MYOl - myosin-1 isoform] 3e-07 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

myosin-1 isoform] 3e-07 

[FUNCAT] 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-07 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YJL074c] 4e-07 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YNL250w] 4e-06 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YBR289w] 4e-06 
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t FUNCAT 1 01.05.04 regulation of carbohydrate utilization is. cerevisiae, YBR289w] 

4e-06 

(FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YBR289w] 4e-06 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YNL250wj 4e-06 

[FUNCAT] 03,13 meiosis [S. cerevisiae, YNL250w] 4e-06 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 
jannaschii, MJ1643) le-05 

[ FUNCAT ] 98 classification not yet clear-cut [S. cerevisiae, YJR134c] 4e-05 

[FUNCAT] 11.04 dna repair {direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] 4e-05 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YNL243w] 7e-05 

[ FUNCAT J 01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] 7e-05 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YNL243wj 7e-05 

[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079cJ 

2e-04 

[FUNCAT] 03,01 cell growth [S. cerevisiae, YNL079c) 2e-04 

[BLOCKS] BL0O682B ZP domain proteins 

[EC] 3.6.1.32 Myosin ATPase le-13 

[PIRKW] nucleus 6e-10 

[PIRKW] phosphotransferase 2e-07 

[ PIRKW J duplication 9e-07 

[ PIRKW) citrulline le-09 

[PIRKW] tandem repeat le-13 

[PIRKW J heart 5e-ll 

[PIRKW] endocytosis 5e-09 

[PIRKW] polymorphism 3e-06 

[PIRKW] cornified cell envelope le-06 

[PIRKW] transmembrane protein 6e-12 

[PIRKW] serine/threonine-specif ic protein kinase 2e-07 

[PIRKW] cell wall le-06 

[PIRKW] zinc finger 5e-09 

[PIRKW] metal binding 5e-09 

[PIRKW] DNA binding 8e-08 

[PIRKW] muscle contraction le-11 

[PIRKW] IgG constant region-binding le-06 

[PIRKW] acetylated amino end 4e-09 

[PIRKW] actin binding le-13 

[PIRKW] mitosis 9e-09 

[PIRKW) microtubule binding 9e-09 

[PIRKWJ ATP le-13 

[PIRKW] thick filament le-10 

[PIRKW] phosphoprotein le-13 

[PIRKW] epidermis le-06 

[PIRKWJ leucine zipper le-07 

[PIRKWJ glycoprotein 4e-07 

[PIRKW] skeletal muscle 4e-10 

[PIRKW) disulfide bond le-07 

[PIRKW] calcium binding le-09 

[PIRKW] alternative splicing le-10 

[PIRKW] coiled coil le-13 

[PIRKW] P-loop le-13 

[PIRKW] heptad repeat 6e-10 

[PIRKW] methylated amino acid le-13 

[PIRKW] basement membrane 3e-06 

[PIRKW] immunoglobulin receptor 2e-07 

[PIRKW] peripheral membrane protein 5e-09 

[PIRKW] dimer le-07 

[PIRKW] cardiac muscle le-10 

[PIRKW] extracellular matrix 3e-06 

[PIRKW] hydrolase le-13 

[PIRKW] microtubule 6e-10 

[PIRKW] muscle 2e-09 

[PIRKW] membrane protein 3e-06 

[PIRKW] EF hand le-09 

[PIRKW] cytoskeleton 6e-12 

[PIRKW] hair le-09 

[PIRKW] calmodulin binding 5e-09 

[PIRKW] Golgi apparatus 3e-08 

[SUPFAM] myosin heavy chain le-13 

[SUPFAM] conserved hypothetical P115 protein le-08 

[SUPFAM] hypothetical protein YJL074c 5e-07 

[SUPFAM] centromere protein E 9e-09 

[SUPFAM] unassigned Ser/Thr or Tyr-specific protein kinases 2e-07 

[SUPFAM] calmodulin repeat homology le-09 

[SUPFAM] myosin motor domain homology le-13 

[SUPFAM] alpha-actinin actin-binding domain homology 3e-13 

[SUPFAM] tropomyosin 3e-07 

[SUPFAM] plectin 3e-13 

[SUPFAM] trichohyalin le-09 

[SUPFAM] pleckstrin repeat homology 4e-06 

[SUPFAM] ribosomal protein S10 homology 3e-13 
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(SUPFAM] 


giantin 3e-08 


[SUPFAM] 


protein kinase homology 2e-07 


(SUPFAM] 


protein kinase C zinc-binding repeat homology 4e-06 


( SUPFAM] 


involucrin le-06 


(SUPFAM] 


kinesin motor domain -homology 9e-09 


[SUPFAM] 


human early endosome antigen 1 5e-09 


[SUPFAM] 


unassigned kinesin-related proteins 8e-08 


[SUPFAM] 


M5 protein 3e-08 


[SUPFAM] 


cytoskeletal keratin 3e-08 


[PROSITE] 


LEUCINE ZIPPER 3 


[PROSITE] 


RGD 1 


( PROSITE J 


MYRISTYL 6 


(PROSITE J 


CK2 PHOSPHO SITE 25 


[PROSITE J 


PKC PHOSPHO SITE 6 


[KW] 


All_Alpha 


[KW] 


LOW COMPLEXITY 9.12 % 


[KW] 


COILED COIL 39.36 % 



SEQ MEESPLSRAPSRGGVNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRD 

SEG 

PRD cccccccccccccceeeecceeeeeccccceeeeeccccccccccceeeeeeeeeecccc 

COILS 

SEQ YHTFVWSSVPESTTDGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFRE 

SEG 

PRD eeeeeeeecccccccccchhhhhhhhhhhhccccccceeeeecccccccccccccccccc 

COILS 

SEQ PRPMDELVTLEEADGGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSR 

SEG 

PRD cccccceeehhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ VQELERALATARQEHTELMEQYKGISRSHGEITEERDILSRQQGDHVARILELEDDIQTI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SEKVLTKEVELDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KEAKSWQEEQSAQAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELAASSQQKAT 

SEG xx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCC . . CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC . CCCCCCCCCCCCCCCCCCCC 

SEQ LLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAGLLQSVE 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCC CCCCCCCCCCC 

SEQ AEKDKILKLSAEILRLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKRELTELRSALR 

SEG ; 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCC 

SEQ VLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNEDATTEDEEAAVGLSCPAALTDSE 

SEG . xxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DESPEDMRLPPYGLCERGDPGSSPAGPREASPLVVISQPAPISPHLSGPAEDSSSDSEAE 

SEG xxxxxxxxxxx 

• PRD hhhhccccccccccccccccccccccccccceeeeeeccccccccccccccccccccchh 

COILS 



SEQ DEKSVLMAAVQSGGEEANLLLPELGSAFYDMASGFTVGTLSETSTGGPATPTWKECPICK 

SEG xx 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ ERFPAESDKDALEDHMDGHFFFSTQDPFTFE 

SEG 

PRD cccccccchhhhhhhccccceeecccccccc 

COILS 



959 



WO 01/12659 



Prosite for DKFZphtes3_7p9 . 3 



PS00005 


190- 


>193 


PS00005 


241- 


>244 


tr o w v w *j 


257- 


>260 




468- 


>471 




652- 


>655 




667- 


>670 


rovv www 


28 


t->32 


rou u w w o 


43->47 


IrdUw w w o 


68 


l->72 


iro wv ww w 


72->76 




129- 


•>133 


pcnnnng 

IT O \J v \J \J v 


156- 


•>160 


PS00006 


208- 


•>212 


row \j w o 


239- 


•>243 


PS00006 


282- 


■>286 


PS00006 

t ^ V V \J V w 


305- 


•>309 


PS00006 


376- 


■>380 


PS00Q06 


383- 


■>387 


pcflfiflOfi 
rouuuuo 


468->472 




520- 


■>524 


pQfinnn k 


537- 


■>541 


roUUUU 0 


539- 


•>543 


roUUUU 0 


543- 


■>547 




593- 


•>597 


Dcnnnnc 
roUUUUD 


595- 


■>599 


roUUUUD 


597- 


■>601 


rouuuu 0 


612- 


■>616 


rouuuuo 


639- 


•>643 




652- 


•>656 


PS00006 


667->671 


PS00006 


683- 


->687 


PS00008 


39->45 


PS00008 


107->113 


PS00008 


204->210 


PS00008 


414- 


•>420 


PS00008 


561- 


■>567 


PS00008 


613- 


•>619 


PS00016 


557- 


■>560 


PS00029 


163- 


■>185 


PS0O029 


475- 


->497 


PS00029 


482- 


■>504 



PKC_PHOSPH0_SITE 

PKC PH0SPH0_SITE 

PKC~PHOSPHO_SITE 

PKC PH0SPH0_SITE 

PKC~PHOSPHO_SITE 

PKC~PH0S PHO_S I TE 

CK2~PH0S PHO_S ITE 

CK2~PH0SPH0_SITE 

CK2_PHOS PH0_S ITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PHOSPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO_SITE 

CK2"PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2 PHOSPHORS ITE 

CK2~PHOS PH0_S ITE 

CK2_PHOS PH0_S ITE 

CK2_PHOSPH0_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2_PH0SPH0_SITE 

CK2 PHOSPHO SITE 

CK2~PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 

LEUCINE_ZIPPER 
LEUCINE_ZIPPER 
LEUCINE ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC0000,6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 
PDOC00029 
PDOC00029 
PDOC00029 



(No Pfam data available for DKFZphtes3_7p9. 3) 
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DKFZphtes3_8e24 



group: signal transduction 

DKFZphtes3_8e24 .3 encodes a novel 658 amino acid putative GTP-binding protein, related to 
yeast YGL099w and mouse MMR1 putative GTP-binding proteins. 

GTP-binding proteins are involved in various signal transduction pathways, transferring the 
signal of a cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 

strong similarity to guanine nucleotide binding proteins 

complete cDNA, complete cds, potential start at Bp 31, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 3290 bp 

Poly A stretch at pos. 3269, polyadenylation signal at pos. 3251 

1 CGTCCAGCGG TCGTGTTGCC ATGGGCCGGA GGAGAGCCCC GGCCGGTGGG 
51 TCGCTGGGAC GGGCCCTTAT GCGCCATCAG ACTCAGCGGA GCCGAAGCCA 

101 TCGTCACACT GACTCCTGGT TGCACACAAG TGAACTCAAT GATGGCTATG 

151 ATTGGGGTCG TCTTAATCTT CAGTCAGTGA CTGAACAGAG CTCCCTTGAT 

201 GACTTCCTTG CTACTGCAGA ACTTGCAGGA ACAGAGTTTG TAGCTGAAAA 

251 ACTTAATATT AAGTTTGTGC CTGCTGAGGC TAGAACTGGA CTACTGTCTT 

301 TCGAGGAGAG CCAGAGAATT AAGAAGCTCC ATGAAGAAAA CAAACAGTTC 

351 TTGTGTATAC CGAGGAGACC AAACTGGAAC CAAAATACTA CCCCAGAAGA 

401 ACTCAAACAA GCAGAGAAAG ATAACTTTCT AGAATGGAGA CGTCAGCTTG 

451 TCCGGCTAGA AGAGGAACAG AAGCTGATAT TGACTCCATT TGAACGAAAT 

501 TTGGACTTTT GGCGCCAGCT CTGGAGAGTC ATTGAGAGAA GTGATATTGT 

551 GGTCCAGATA GTAGATGCTC GAAACCCACT CCTGTTTAGA TGTGAGGATT 

601 TGGAATGTTA TGTGAAAGAA ATGGATGCCA ATAAGGAGAA CGTCATTCTG 

651 ATCAACAAGG CAGACTTGCT GACTGCTGAG CAGCGGAGTG CCTGGGCCAT 

701 GTACTTCGAA AAAGAAGATG TGAAGGTTAT TTTCTGGTCA GCTTTGGCCG 

751 GAGCCATTCC CCTGAATGGT GACTCTGAGG AAGAGGCAAA C AG AG AT GAT 

801 AGACAAAGCA ACACAACTGA GTTTGGACAT TCCAGTTTCG ACCAGGCTGA 

851 AATTTCCCAC AGTGAATCCG AACATCTCCC AGCTAGGGAT TCTCCTTCAC 

901 TTAGTGAAAA TCCCACAACG GATGAAGATG ACAGTGAGTA TGAGGACTGT 

951 CCAGAGGAGG AGGAAGACGA CTGGCAGACG TGCTCAGAAG AAGACGGTCC 
1001 CAAGGAAGAG GACTGCAGCC AGGACTGGAA GGAAAGCTCT ACTGCAGATT 
1051 CTGAGGCTCG GAGCAGGAAA ACCCCACAGA AGAGGCAGAT ACACAATTTT 
1101 AGCCATCTGG TATCCAAGCA GGAGTTACTG GAGCTCTTTA AGGAGCTACA 
1151 CACTGGGAGA AAGGTGAAAG ATGGGCAACT TACGGTCGGA CTGGTGGGCT 
1201 ACCCTAATGT TGGTAAGAGT TCAACAATCA ACACCATCAT GGGCAACAAG 
1251 AAAGTATCTG TGTCTGCCAC ACCTGGTCAC ACAAAGCACT TTCAGACTCT 
1301 CTATGTGGAG CCTGGCCTCT GCCTGTGTGA CTGTCCTGGC TTGGTGATGC 
1351 CATCTTTTGT GTCTACCAAG GCAGAAATGA CTTGCAGCGG AATCCTCCCA 
1401 ATTGATCAGA TGAGAGATCA TGTTCCTCCT GTATCACTAG TTTGCCAGAA 
1451 TATTCCAAGA CATGTTTTAG AAGCTACCTA TGGCATTAAC ATCATAACGC 
1501 CTAGAGAGGA TGAAGATCCC CACCGACCTC CAACATCGGA AGAACTGTTG 
1551 ACAGCTTATG GATACATGCG AGGATTCATG ACAGCGCATG GACAGCCAGA 
1601 CCAGCCTCGA TCTGCGCGCT ACATCCTGAA GGACTATGTC AGTGGTAAGC 
1651 TGCTGTACTG CCATCCTCCT CCTGGAAGAG ATCCTGTAAC TTTTCAGCAT 
1701 CAACACCAGC GACTCCTAGA GAACAAAATG AACAGTGATG AAATAAAAAT 
1751 GCAGCTAGGC AGAAATAAAA AAGCAAAGCA GATTGAAAAT ATCGTTGACA 
1801 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGGAGTCCAG 
1851 GCTGTGATGG GTTACAAGCC CGGGAGTGGT GTAGTGACTG CATCCACTGC 
1901 GAGCTCTGAG AACGGGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 
1951 ATAAAAAAGA AAAAAGTCGT AGACTCTACA AGCACCTGGA TATGTGAGGT 
2001 TGGGCTGCAA CAGAAATGTC ATCTGCATTG TGCAGATGGA AAAGAGCAGA 
2051 AGCTGCCTGT TGCCTGTGGA ACTGTCCCAA GACACTAGCA CTGTAGAACG 
2101 GGCCCTGCTC TTGCAGAGCA CGGCTGCACC CAACAGTCTC CATGTCAAGA 
2151 CCAAGGGCCT CCTGGAAACA CCAGCTCTGA CAAAAAGGAG TCATCTGGGA 
2201 GCCCGAGAAT CCTACTCCTG GCCGGGCACA GTGGCTCACG CACCAACATG 
2251 GAGAAACCCC GTCTCTACTA AAAATACAAA AAAATTAGCC AGGCGTGGTG 
2301 GCGCGCACCT GTAATCCCAG CTACTCGGGA GGCTGAGGCA GGAGAATCAC 
2351 TTGAACCAGG GAGGCAGAGT TTGCAGTGAA TGGAGATTGC GCCGCTGCAC 
2401 TCCAGCCTGG GCGACAGAGT GAGACTGCAT CACAAGAAAA AAAATTTGCA 
2451 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 
2501 CATTCTGAGT GTCCTAGTTG GGTTCCTCCG ACTCTAAACA AGGGACTTGG 
2551 GTTCAGTTAG TGTACAGCGG GGGCTCACGT CCACTAAGGA ACATGTAGAA 
2601 TGTAACCACC GGGTGACAGG GAAGCTGCGG TATTTACTAC CTAGCCCCCA 
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2651 TCTTCACTGG TTATTCCACT TATTTAAAAT GTCCAGAATA AGCAAATCTC 
2701 CATATAGAGG AAGTAGATTA GTGGTTGCTT CGGGATGGGA GGAATGGGAA 
2751 GATTGAGGTC TTTCTTTTGC AGTGATAAAA ATGTCCTAAA ATTGACTGTA 
2801 GCGATGGTCA CACAACTCTG AATATGCTTA AGACCATTGA ATTACACACT 
2851 TTACGTTGGT GAATTGTATG GTATGTAAAT TATAGTTCAA TAACATAGTT 
2901 ACAAAAGATA ATCAAAAGCA TGAAAGCACT ATTGATGTGG TTTGGATCTG 
2951 TGTCCTCACC GAGTCTCATG TTGAAATGTA AGCCCCCTGG TGGGAGGCGA 
3001 TGGGATTATG GGGCAGAGTC CTCACAAACG GTTTAGCACC ACCCGCTCAG 
3051 TGCTGTTCTC CTGATATTGA GTCCTCATCA CATCTGGTTG CTTCAAAGTG 
3101 TGTGGTGCCT CCCCTCTGTC TCCCTCCTGC TCTGGCCATA TAAGATGTGC 
3151 CTGCTTCTCC TTCGCCTTCT AACATGATTG TAAGTTTCCT GAGGCCTCCC 
3201 TAGAAGCAAA AGCTGCTGTG CTTCCTGTAC CATCTACTGG ACCGTGAGCC 
3251 AATTAAACCT CTTTTCTTTA TAAAAAAAAA AAAAAAAAGG 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 21 bp to 1994 bp; peptide length: 658 
Category: strong similarity to known protein 



1 MGRRRAPAGG SLGRALMRHQ TQRSRSHRHT DSWLHTSELN DGYDWGRLNL 
51 QSVTEQSSLD DFLATAELAG TEFVAEKLNI KFVPAEARTG LLSFEESQRI 
101 KKLHEENKQF LCIPRRPNWN QNTTPEELKQ AEKDNFLEWR RQLVRLEEEQ 
151 KLILTPFERN LDFWRQLWRV IERSDIVVQI VDARNPLLFR CEDLECYVKE 
201 MDANKENVIL INKADLLTAE QRSAWAMYFE KEDVKVIFWS ALAGAIPLNG 
251 DSEEEANRDD RQSNTTEFGH SSFDQAEISH SESEHLPARD SPSLSENPTT 
301 DEDDSEYEDC PEEEEDDWQT CSEEDGPKEE DCSQDWKESS TADSEARSRK 
351 TPQKRQIHNF SHLVSKQELL ELFKELHTGR KVKDGQLTVG LVGYPNVGKS 
401 STINTIMGNK KVSVSATPGH TKHFQTLYVE PGLCLCDCPG LVMPSFVSTK 
451 AEMTCSGILP IDQMRDHVPP VSLVCQNIPR HVLEATYGIN IITPREDEDP 
501 HRPPTSEELL TAYGYMRGFM TAHGQPDQPR SARYILKDYV SGKLLYCHPP 
551 PGRDPVTFQH QHQRLLENKM NSDEIKMQLG RNKKAKQIEN IVDKTFFHQE 
601 NVRALTKGVQ AVMGYKPGSG VVTASTASSE NGAGKPWKKH GNRNKKEKSR 
651 RLYKHLDM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8e24 , frame 3 

SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I., N » 3, Score = 560, P = 1.6e-lll 

PIR:S64106 hypothetical protein YGL099w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 544, P = 2.6e-105 

TREMBL : CEAF3 1 4 3_1 gene: "C53H9.2"; Caenorhabditis elegans cosmid 
C53H9., N = 1, Score - 551, P = 2.9e-53 

SWISSPROT:MMRl_MOUSE POSSIBLE GTP-BINDING PROTEIN MMR1 . , N - 2, Score = 
311, P - 7.5e-31 

>SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I. 

Length « 616 

HSPs: 

Score = 560 (84.0 bits), Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 
Identities * 119/253 (47%), Positives = 163/253 (64%) 

Query: 12 LGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLDDFLATAELAGT 71 
LGRA+ T+ R+ + H + + R L+SVT ++ LD+FL TAEL 
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Sbjct: 


12 


LGRAIQSDFTKNRRNRK— GGLKHIVDSDPKAH— RAALRSVTHETDLDEFLNTAELGEV 


67 


Query: 


72 


EFVAEKLNIKFVP-AEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWNQNTTPEELKQ 


130 




EF+AEK N+ + E LL*5 Bb+ K K+ b+NK L IPKKF+W+y ll bi» + 




Sbjct: 


68 


EFIAEKQNVTVIQNPEQNPFLLSKEEAARSKQKQEKNKDRLTIPRRPHWDQTTTAVELDR 


127 


Query: 


131 


AEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQIVDARNPLLFR 


190 




E+++FL WRR L +L++ + I+TPFERNL+ WRQLWRVIERSD+VVQIVDARNPL FR 




Sbjct: 


128 


MERESFLNWRRNLAQLQDVEGFIVTPFERNLEIWRQLWRVIERSDVVVQIVDARNPLFFR 


187 


Query: 


191 


CEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWSALAGAIPLNG 


250 




LE YVKE+ +K+N +L+NKAD+LT EQR+ W+ YF + ++ +F+SA A N 




Sbjct: 


188 


SAHLEQYVKEVGPSKKNFLLVNKADMLTEEQRN YWS S YFNENN I PFLFFSARMAA- EANE 


246 


Query: 


251 


DSEEEANRDDRQSN 264 








E+ + SN 




Sbjct: 


247 


RGEDLETYESTSSN 260 




Score 


= 532 


(79.8 bits), Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 




Identities - 131/323 (40%), Positives - 192/323 (59%) 




Query: 


340 


STADSEARSRKTPQKRQIHNFSHLVSKQELLELFKELHTGRKVKDGQ — LTVGLVGYPNV 


397 






ST+ +E + +H+ S + + + L +F++ + + DG+ +T GLVGYPNV 




Sbjct: 


256 


STSSNEIPESLQADENDVHS-SRIATLKVLEGIFEKFAS— TLPDGKTKMTFGLVGYPNV 


312 


Query: 


398 


GKSSTINTIMGNKKVSVSATPGHTKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSG 


457 






GKSSTIN ++G+KKVSVS+TPG TKHFQT+ + + L DCPGLV PSF +T+A++ G 




Sbjct: 


313 


GKSSTINALVGSKKVSVSSTPGKTKHFQTINLSEKVSLLDCPGLVFPSFATTQADLVLDG 


372 


Query: 


458 


ILPIDQMRDHVPPVSLVCQNIPRHVLEATYGINI-ITPREDEDPHRPPTSEELLTAYGYM 


516 




+LPIDQ+R++ P +L+ + IP+ VLE Y I I I P E E P+++E+L + 




Sbjct: 


373 


VLPIDQLREYTGPSALMAERIPKEVLETLYTIRIRIKPIE-EGGTGVPSAQEVLFPFARS 


431 


Query: 


517 


RGFMTAH-GQPDQPRSARYILKDYVSGKLLYCHPPPG — RDPVTFQHQHQRLLENKMNSD 


573 






RGFM AH G PD R+AR +LKDYV+GKLLY HPPP F +H + + + SD 




Sbjct: 


432 


RGFMRAHHGTPDDSRAARILLKDYVNGKLLYVHPPPNYPNSGSEFNKEHHQKIVSA-TSD 


490 


Query: 


574 


EIKMQLGR NKKAKQIEN-IVDKTFFHQEN — VRALTKGVQAVM-G--YKPGSGWTA 


624 






I +L R + E+ +VD +F QEN VR + KG M G YK + + 




Sbjct: 


491 


SITEKLQRTAISDNTLSAESQLVDDEYF-QENPHVRPMVKGTAVAMQGPVYKGRNTMQPF 


549 


Query: 


625 


STASSENGAGK-PWKKHGNRNKKEKSRRL 652 






+++ + K P G + K+R+L 




Sbjct: 


550 


QRRLNDDASPKYPMNAQGKPLSRRKARQL 578 




Score 


= 47 


(7.1 bits), Expect = 1.3e-60, Sum P(3J - 1.3e-60 




Identities = 21/84 (25%), Positives = 35/84 (41%) 




Query: 


552 


GRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQENVRALTKGVQA 


611 






G D T++ + + +DE + R K fE I +K F TK 




Sbjct: 


248 


GEDLETYESTSSNEIPESLQADENDVHSSRIATLKVLEGIFEK--FASTLPDGKTKMTFG 


305 


Query: 


612 


VMGYKPGSGWTASTASSENGAGK 635 








++GY P G +ST ++ G+ K 




Sbjct: 


306 


LVGY-PNVG — KSSTINALVGSKK 326 




Score 


« 43 


(6.5 bits). Expect - 1.6e-lll, Sum P(3) = 1.6e-lll 




Identities = 7/13 (53%), Positives - 9/13 (69%) 




Query: 


638 


KKHGNRNKKEKSR 650 








KKH +NK+ K R 





Sbjct: 596 KKHNKKNKRSKQR 608 



Pedant information for DKFZphtes3 - 8e24, frame 3 



Report for DKFZphtes3_8e24 . 3 



658 

75226.58 
5.86 

SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN CHROMOSOME 

99 unclassified proteins [S. cerevisiae, YGL099w] 3e-55 

r general function prediction [M. jannaschii, MJ1464] le-16 

08.16 extracellular transport (S. cerevisiae, YER006wJ 3e-09 

P-loop le-27 

GTP binding le-27 

conserved hypothetical protein MG442 7e-08 



963 



[LENGTH] 

[MW] 

[pi] 

[HOMOLJ 

I. 5e-56 

(FONCATJ 

[FUNCATJ 

t FUNCAT] 

(PIRKWJ 

[PIRKW] 

[SUPFAM] 



WO 01/12659 



PCT7IB00/01496 



[PROSITE] 


ATP GTP A 1 




[PROSITE] 


MYRISTYL 3 




[PROSITE J 


AMI DAT I ON 2 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE J 


CK2 PHOSPHO SITE 


19 


[PROSITE] 


TYR PHOSPHO SITE 


2 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASN_GLYCOSYLATION 


2 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


4.56 % 



SEQ MGRRRAPAGGSLGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLD 

SEG xxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccccchhhhhhhhccccch 

SEQ DFLATAELAGTEFVAEKLNIKFVPAEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWN 

SEG 

PRD hhhhhhhhhhheeeecccceeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccccc 

SEQ QNTTPEELKQAEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhcceeeee 

SEQ VDARNPLLFRCEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWS 

SEG 

PRD eccccccccchhhhhhhhhhhccccceeeeecccchhhhhhhhhhhhhhhhccceeeeec 

SEQ ALAGAIPLNGDSEEEANRDDRQSNTTEFGHSSFDQAEISHSESEHLPARDSPSLSENPTT 

SEG 

PRD cccccccccccchhhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

SEQ DEDDSEYEDCPEEEEDDWQTCSEEDGPKEEDCSQDWKESSTADSEARSRKTPQKRQIHNF 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccccccc 

SEQ SHLVSKQELLELFKELHTGRKVKDGQLTVGLVGYPNVGKSSTINTIMGNKKVSVSATPGH 

SEG 

PRD ccccchhhhhhhhhhhhhhhccccceeeeeecccccccccceeeeccccceeeeeccccc 

SEQ TKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSGILPIDQMRDHVPPVSLVCQNIPR 

SEG 

PRD cceeeeeeeccceeecccccccccccchhhhhhhhccccccccccccccceeeeecccch 

SEQ HVLEATYGINIITPREDEDPHRPPTSEELLTAYGYMRGFMTAHGQPDQPRSARYILKDYV 

SEG 

PRD hhhhhhhhccccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhcc 

SEQ SGKLLYCHPPPGRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQE 

SEG 

PRD ccceeeeccccccccccchhhhhhhhhhcccchhhhhhhhcchhhhhhhhhhhhccccch 

SEQ NVRALTKGVQAVMGYKPGSGVVTASTASSENGAGKPWKKHGNRNKKEKSRRLYKHLDM 

SEG 

PRD hhhhhhhceeeeeecccccceeecccccccccccccccccccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_8e24 . 3 



PS00001 


264- 


>268 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


359- 


>363 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00004 


410- 


>414 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


21 


->24 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


26 


->29 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


97- 


>100 


PKC" 


"PHOSPHO"SITE 


PDOC00005 


PS00005 


348- 


>351 


PKC" 


"PHOSPHO'SITE 


PDOC00005 


PS00005 


378- 


>381 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


448- 


>451 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


493- 


>496 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00005 


531- 


>534 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


541- 


>544 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


649- 


>652 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


52 


->56 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


57 


->61 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


93 


->97 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


123- 


>127 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


155- 


>159 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


252- 


>256 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


271- 


>275 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


279- 


>283 


CK2~ 


PHOSPHO SITE 


PDOC00006 
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PS00006 


281- 


->285 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


293- 


->297 


CK2~PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


299 


->303 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


305->309 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


320 


->324 


CK2~PH0SPHO 


"site 


PDOC00006 


PS00006 


322->326 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


340- 


->34 4 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


365- 


->369 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


449->453 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


->497 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


505->509 


CK2 PHOSPHO 


site 


PDOC00006 


PS00007 


480 


->488 


tyr"phospho" 


'site 


PDOC00007 


PS00007 


190- 


->198 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


i 


9->15 


MYRISTYL 




PDOC00008 


PS00008 


432- 


->438 


MYRISTYL 




PDOC00008 


PS00008 


620 


->626 


MYRISTYL 




PDOC00008 


PS00009 




l->5 


AMIDATION 




PDOC00009 


PS00009 


378 


->382 


AMIDATION 




PDOC00009 


PS00017 


393- 


->401 


ATP GTP A 




PDOC00017 



(No Pfam data available for DKFZphtes3_8e24 .3] 
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DKFZphtes3_8gll 



group: testes derived 

DKFZphtes3_8gll encodes a novel proline-rich 939 amino acid protein without similarity to 
known proteins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) . 

No informative BLAST results; No predictive prosite, pfam or ■ SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown, prolin ritch protein 
1 EST hit (from testis library) 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 3100 bp 

Poly A stretch at pos. 3056, polyadenylation signal at pos. 3041 

1 AGAGTCTTCC CTCAGCATAT TTTACGATAG AGAAGATCTT GTTCCAATGG 
51 AAGAAAGTGA GGACTCACAG AGTGATTCCC AGACAAGGAT TTCTGAGTCC 

101 CAACACTCCC TCAAGCCAAA TTATCTTTCC CAGGCCAAGA CTGACTTCTC 

151 AGAACAGTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA GCAGCAAAAC 

201 TCTTAAGGAG TCAAATACCC CCCGATGTGC CTCCACCTCT AGCTTCAGGT 

251 CTAGTCCTAA AATACCCTAT CTGCCTACAG TGTGGCCGAT GTTCAGGACT 

301 TAATTGCCAT CATAAATTAC AGACCACTTC GGGGCCTTAT CTTCTTATCT 

351 ATCCACAGCT CCACCTTGTA CGCACTCCTG AAGGCCATGG TGAGGTTCGG 

401 TTGCATCTTG GCTTTAGGCT GAGAATTGGG AAAAGATCCC AAATCTCAAA 

451 GTATCGTGAA AGAGATAGAC CCGTCATACG GAGAAGCCCT ATATCACCAT 

501 CACAAAGGAA AGCTAAAATC TATACTCAAG CTTCCAAGAG TCCTACTTCC 

551 ACAATAGATT TGCAGTCTGG GCCTTCCCAG TCCCCTGCTC CTGTACAAGT 

601 CTACATCAGG CGAGGACAAC GCAGCAGGCC TGACTTAGTA GAAAAGACAA 

651 AAACTAGAGC ACCTGGGCAC TATGAATTCA CTCAAGTTCA CAACCTACCA 

701 GAGAGTGACT CTGAAAGCAC TCAGAATGAA AAACGGGCTA AAGTGAGAAC 

751 CAAAAAGACC TCTGATTCAA AATATCCAAT GAAGAGAATC ACCAAGCGAC 

801 TTAGAAAACA CAGAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 

851 CCTTCTAGGG AATTAGCAGC CCATTTAAGA AGGAAGAGGA TTGGAGCAAC 

901 TCAGACAAGT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 

951 CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAAGCGGGC ATTCCAAACA 
1001 GCACACAGAG TTATAGCTTC TGTTGGGCGG AAGCCTGTGG ACGGGACAAG 
1051 GCCAGACAAT TTGTGGGCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 
1101 GGGACTATTG CTTACCAAGC AGTATCAAAA GAGACAAGAG GTCAGCTGAC 
1151 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTGTGGGG 
1201 AGGAACGGTC CAGTGCAGAT CAGCTCAACA GCCAAGAAGA GCTTACTCTT 
1251 TCCAACCCAG ACCTCTTCGA CTGCCCAAGC CCACAGATTC CCAAAGTGGT 
1301 ATTGCTTTCC AAACTGCCTC AGTGGGGCAG CCTCTGAGAA CTGTTCAAAA 
1351 GGACAGTAGT AGCAGATCAA AGAAAAACTT CTATAGAAAT GAAACCTCCA 
1401 GCCAGGAGTC TAAGAACTTG TCCACACCAG GAACCAGAGT TCAGGCCCGA 
1451 GGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 
1501 TAAAGACAAA CTCACACACA AGGAGCATAA CCACCCCAGC TTCTATAGGG 
1551 AGAGAACCCC ACGCGGTCCT TCTGAGAGAA CCCGTCATAA CCCCTCTTGG 
1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACGCAGTT CCTTGGAGAG 
1651 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 
1701 ACCATTCCAG TCCTTCTGAG AGAAGCTGGC GCAGTCCGTC TCAGAGAAAT 
1751 CACTGCAGTC CCCCCGAGAG GAGCTGTCAC AGTCTCTCTG AAAGGGGCCT 
1801 TCACAGTCCC TCTCAGAGGA GCCATCGCGG TCCCTCTCAG AGAAGACATC 
1851 ACAGTCCCTC AGAGAGAAGC CATCGCAGTC CCTCAGAGAG AAGCCATCGC 
1901 AGTCCCTCTG AGAGAAGACA TCGCAGTCCC TCCCAGAGGA GCCATCGCGG 
1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC 
2001 CCTCTCAGAG GAGCCATCGT GGTCCCTCTG AGAGAAGACA TCACAGTCCC 
2051 TCTAAGAGAA GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 
2101 AGAGAGAAGC CATCACAGTC CCTCTGAGAG AAGCCATCAC AGTCCCTCTG 
2151 AGAGAAGACA TCACAGTCCC TCTGAGAGAA GCCATTGCAG TCCCTCTGAG 
2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC CCTCTGAGAG 
2251 AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTGAGAGAA 
2301 GCCATCACAG TCCCTCTGAG AGAAGACGTC ACAGTCCCTT GGAGAGGAGC 
2351 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGAGGAGATC 
2401 TCACAGGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 
2451 GTCCCTCAGA GAAGAGCCAC CTCAGTCCCT TGGAAAGAAG CCGTTGCAGT 
2501 CCCTCTGAGA GGAGAGGACA CAGTTCCTCT GGGAAAACCT GTCACAGTCC 
2551 CTCTGAGAGA AGCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 
2601 CTGAGAGGAG CCATCGCAGT TCCTGTGAGA GAACCCGTCA CAGTCCCTCT 
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2651 GAGATGAGGC CAGGGAGGCC CTCTGGGAGG AACCATTGCA GTCCCTCTGA 

2701 GAGGAGCCGA CGCAGTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 

2751 GAGAGAGGCC CAGCCATAGT TTGTCTAGAG ATTTCAAGAA TCAAACAACT 

2801 CTCCTCGGGA CCACACATAA AAATCCCAAA GCAGGGCAAG TGTGGAGGCC 

2851 TGAAGCTACT CGATGAGGCG AGGTCCGCCC CTATTATTCA TTGTCCTAAG 

2901 TCTTCATCGT GCTGCCCTTT CCAGGCTTCT TTCCTGCTCA GCCACTGCCT 

2951 CCAATTCCTG CGCCCCCAGC GTGGAAAGGC TTCCATTTCT CTCTACCGGG 

3001 GGGGAGGCGG GTGAGAATGG GTCTGTAATT TCTCTAAGAT GAATAAAGGG 

3051 GCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 2863 bp; peptide length: 939 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: ATP GTP A (824-832) 



1 MEESEDSQSD SQTRISESQH SLKPNYLSQA KTDFSEQFQL 
51 KLLRSQIPPD VPPPLASGLV LKYPICLQCG RCSGLNCHHK 
101 IYPQLHLVRT PEGHGEVRLH LGFRLRIGKR SQISKYRERD 
151 PSQRKAKIYT QASKSPTSTI DLQSGPSQSP APVQVYIRRG 
201 TKTRAPGHYE FTQVHNLPES DSESTQNEKR AKVRTKKTSD 
251 RLRKHRKFYT NSRTTIESPS RELAAHLRRK RIGATQTSTA 
301 QPKFMQLLFQ SLKRAFQTAH RVIASVGRKP VDGTRPDNLW 
351 ARDYCLPSSI KRDKRSADKL TPAGSTIKQE DILWGGTVQC 
401 SFQPRPLRLP KPTDSQSGIA FQTASVGQPL RTVQKDSSSR 
451 SSQESKNLST PGTRVQARGR ILPGSPVKRT WHRHLKDKLT 
501 RERTPRGPSE RTRHNPSWRN HRSPSERSQR SSLERRHHSP 
551 KNHSSPSERS WRSPSQRNHC SPPERSCHSL SERGLHSPSQ 
601 HHSPSERSHR SPSERSHRSP SERRHRSPSQ RSHRGPSERS 
651 SPSQRSHRGP SERRHHSPSK RSHRS PARRS HRSPSERSHH 
701 SERRHHSPSE RSHCSPSERS HCSPSERRHR SPSERRHHSP 
751 RSHHSPSERR RHSPLERSRH SLLERSHRSP SERRSHRSFE 
801 HSPSEKSHLS PLERSRCSPS ERRGHSSSGK TCHSPSERSH 
851 TSERSHRSSC ERTRHSPSEM RPGRPSGRNH CSPSERSRRS 
901 PGERPSHSLS RDFKNQTTLL GTTHKNPKAG QVWRPEATR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8gll, frame 2 

TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific 
acidic repeat protein precursor"; Phytophthora infestan3 cyst 
germination specific acidic repeat protein precursor (car90) gene, 
complete cds., N = 1, Score = 457, P = 2.3e-39 

TREMBL:AC004561_38 gene: "F16P2.41"; product: "putative proline-rich 
protein"; Arabidopsis thai i ana chromosome II BAC F16P2 genomic 
sequence, complete sequence., N « 1 , Score = 340, P * 4.2e-27 

TREMBL:AF062655_1 product: "plenty-of-prolines-101"; Mus musculus 
plenty-of-prolines-101 mRNA, complete cds., N = 1, Score = 313, P - 
3.6e-24 

PIR:PN0099 son3 protein - human (fragment), N ■» 1, Score ** 292, P = 
1.2e-22 



>TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific acidic 
repeat protein precursor"; Phytophthora infestans cyst germination 
specific acidic repeat protein precursor (car90) gene, complete cds. 



LEDLQLKIAA 
LQTTSGPYLL 
RPVIRRSPIS 
QRSRPDLVEK 
SKYPMKRITK 
SLKRQPKKPS 
ASKNYYPKQN 
RSAQQPRRAY 
SKKNFYRNET 
HKEHNHPSFY 
SQRSHCSPSR 
RSHRGPSQRR 
HCSPSERRHR 
SPSERSHHSP 
SEKSHHSPSE 
RSKRRISERS 
RSPSGMRQGR 
PLKEGLKYSF 
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Length « 1,489 

HSPs: 



Score « 457 (68.6 bits), Expect - 2.3e-39, P = 2.3e-39 
Identities - 91/444 (20%), Positives - 239/444 (53%) 



V uct j ■ 


475 


SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 


533 




+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 




Sbjct : 


584 


APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 


642 


Query. 


534 


ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 


593 






E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 




Sbjct: 


643 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 


702 


uery. 


594 


RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 


653 






P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 




Sbjct: 


703 


YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 


762 


n or 

y. 


654 


QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 


713 






+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 




Sbjct: 


763 


EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 


822 


Query: 


714 


CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 


773 




+P+E + P+E +P+E ++P+E++ ++P+E++ ++P+E ++P E + + 




Sbjct: 


823 


YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 


882 


Query: 


774 


ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 


832 




E + +P++ ++ E + + E +++P+E++ +P E + P+E ++ + +T 




Sbjct: 


883 


EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


942 


Query: 


833 


HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 


892 






++P+E + +P+ +E + + E T + P+E P+ +P+E + +P+ 




Sbjct: 


943 


YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 


1002 


Query: 


893 


KEGLKYSFPGERPSHSLSRDFKNQTT 918 








+E Y+ P E +++ + + + T 




Sbjct: 


1003 


EE-TTYA-PTEETTYAPAEETPYEPT 1026 




Score 


= 445 


(66.8 bits), Expect - 4.5e-38, P = 4.5e-38 




Identities » 83/394 (21%), Positives =» 212/394 (53%) 




Query: 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 




E TP P+E T + P+ +P+E + + E ++P++ + +P+ + P+E + 




Sbjct: 


763 


EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 


822 


Query: 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 




+P++ P E + ++ +E ++P++ + P+++ ++P+E + +P+E + P+ 




Sbjct: 


823 


YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 


882 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E +P++ + P+E + + +E +P++ + P+E + P++ + +P + 




Sbjct: 


883 


EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


942 


Query: 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 




+P+E + ++P+E + ++P+E ++P+E + P+E + +P+E +P+E ++P 




Sbjct: 


943 


YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 


1002 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 




E++ ++P+E + ++P+E + P E + ++ E + +P+E ++ S E + + E + 




Sbjct: 


1003 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETT 


1062 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 




++P+E++ P E + +P+E ++ + +T ++P+E + +P+ +E + 




Sbjct: 


1063 


YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 


1122 



Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P E + P+E 

Sbjct: 1123 EETTYAPTEETTYAPTEETMYAPIEETTYGPTEE 1156 

Score = 439 (65.9 bits), Expect - 2.0e-37, P - 2.0e-37 
Identities « 86/421 (20%), Positives - 223/421 (52%) 



Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 848 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 906 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 907 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 966 



968 



WO 01/12659 



PCT/IB00/01496 



Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + + P+E +p + + p+E + +P+E P+ 
Sbjct: 967 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 1026 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
Sbjct: 1027 EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1086 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
Sbjct: 1087 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 1146 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 
Sbjct: 1'147 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 1206 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P 

Sbjct: 1207 YAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPT 1266 

Query: 893 KE 894 
+E 

Sbjct: 1267 EE 1268 

Score « 439 {65.9 bits), Expect = 2.0e-37, P = 2.0e-37 
Identities « 91/434 (20%), Positives = 232/434 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 440 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 498 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 499 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 558 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + P+E + +P+E P+ 
Sbjct: 559 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 618 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
Sbjct: 619 EETTYAPTEETT YAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 678 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
Sbjct: 679 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 738 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 
Sbjct: 739 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 798 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E ++P TE++ET ++P+E P P+ +P+E + +P 

Sbjct: 799 YAPTEETTYAP TEETPYEPT-EETTYAPTEETPYEPTEETTYTPTEETTYAPT 850 

Query: 893 KEGLKYSFPGERPSHS 908 

+E Y+ P E+ +++ 
Sbjct: 851 EE-TTYA-PTEKTTYA 864 

Score = 437 (65.6 bits), Expect - 3.3e-37, P = 3.3e-37 
Identities = 85/417 (20%), Positives - 223/417 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E+ ++P++ + +P+ + P+E + 

Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E P++ + P+E ++P++ + +P + 

Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + + P+E ++P+E + +P+E + + +E +P+E ++P+ 
Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + P+E + ++P+E ++P E + ++ E + +P+E ++ E + + E + 
Sbjct: 659 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 718 



969 



WO 01/12659 



PCTYIB00/01496 



Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 






++P+E++ +P E + +P E + + +T ++P+E + +P+ +E + 




Sbjct: 


719 


YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 


778 


Query: 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTT 918 






T ++P+E P+ +P+E + +P +E Y P E +++ + + + T 




Sbjct: 


779 


GETTYAPTEETTYAPTEETTYAPTEETTYAPTEE-TPYE-PTEETTYAPTEETPYEPT 834 


Score 


= 428 


(64.2 bits), Expect = 3.1e-36, P » 3.1e-36 




Identities = 


= 89/440 (20%), Positives * 228/440 (51%) 




Query: 


473 


PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 


531 






P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 




Sbjct: 


470 


PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 


528 


Query: 


532 


SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQR 


591 






E ++P++ + +P+ + +P+E + + P++ P E + ++ +E ++P++ 




Sbjct: 


529 


PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 


588 


Query: 


592 


SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 


651 






+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 




Sbjct: 


589 


TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 


648 


Query: 


652 


PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSKHSPSERRHHSPSER 


711 






P++ + P+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 




Sbjct: 


649 


PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 


708 


Query: 


712 


SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 


771 






+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 




Sbjct: 


709 


TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 


768 


Query: 


772 


LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 


830 






E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 




Sbjct: 


769 


PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 


828 


Query: 


831 


TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 


890 






T + P+E + +P+ +E + + E+T ++P+E P+ P+E + + 




Sbjct: 


829 


TPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYA 


888 


Query: 


891 


PLKEGLKYS FPGERPSHSLSRD 912 








P KE Y+ P E +++ + + 




Sbjct: 


889 


PTKE-TT YA- PTEETT YASTEE 908 




Score 


- 427 


(64.1 bits), Expect = 4.0e-36, P = 4.0e-36 




Identities = 81/394 (20%), Positives - 213/394 (54%) 




Query: 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 






E T GP+E T + P+ +P+E + + E + P+ + +P+ + +P+E + 




Sbjct: 


739 


EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 


798 


Query: 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 






+P++ +P E + + +E ++P++ + P++ ++P+E + +P+E + +P+ 




Sbjct: 


799 


YAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPT 


858 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E+ +P++ + P+E + P+E +P++ + P+E ++ ++ + +P + 




Sbjct: 


859 


EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETT 


918 


Query: 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 






+P+E + + P+E + ++P+E ++P+E + +P+E + +P+E +P+E + P+ 




Sbjct: 


919 


YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 


978 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 






E++ ++P+E + ++P+E ++P+E + ++ E + +P+E + E + + E + 




Sbjct: 


979 


EETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


1038 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 






++P+E++ + E + +P+E ++ + +T + P+E + +P+ +E + + 




Sbjct: 


1039 


YAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPT 


1098 


Query: 


861 


ERTRHS PSEMRPGRPSGRNHCS PSERSRRS PLKE 894 








E T ++P+E P+ P+E + +P +E 




Sbjct: 


1099 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 1132 




Score 


= 424 


(63.6 bits), Expect « 8.5e-36, P - 8.5e-36 





Identities - 81/394 (20%), Positives - 210/394 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRKHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E + P++ + +P+ + +P+E + 

Sbjct: 939 EETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETM 998 



970 



WO 01/12659 



PCT/IBOO/01496 



Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P + +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 
Sbjct: 999 YAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPT 1058 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E +P++ + P+E ++P++ + +PA + 
SbjCt: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

p+E + ++P+E + ++P+E ++P E + P+E + +P+E +P+E ++P+ 
Sbjct: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + p+ + ++P+E ++P E + ++ E + +P+E + E + + E + 
Sbjct: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

+ P+E++ +P E + +P+E ++ + +T ++P + + P+ +E + + 

Sbjct: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 861 ERTRHS PSEMRPGRPSGRNHCS PSERSRRS PLKE 894 

E T ++P+E P+G +P+E + +P +E 

Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEE 1332 

Score » 422 (63.3 bits). Expect = 1.4e-35, P = 1.4e-35 
Identities - 84/407 (20%), Positives = 216/407 (53%) 



Query: 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 






E T P+E T + P+ P+E + + E + P++ + +P+ + +P+E + 




Sbjct: 


795 


EETTYAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETT 




Query: 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 






+P+++ +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 




Sbjct: 


855 


YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPT 


914 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E +P++ + P+E + +P+E +P++ + P+E ++P++ + +PA + 




Sbjct: 


915 


EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 


974 


Query: 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 




P+E + ++P+E + ++P+E ++P E + +P+E + +P+E P+E ++P+ 




Sbjct: 


975 


YEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPT 


1034 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 






E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E + + E + 




Sbjct: 


1035 


EETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETT 


1094 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 






++P+E++ +P E + +P+E + + +T ++P+E + +P+ E + 




Sbjct: 


1095 


YAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPT 


1154 


Query: 


861 


ERTRHS PSEMRPGRPSGRNHCS PSERSRRS PLKEGLKYSFPGERPSHS 908 








E T ++P+E P+ +P+E + P E Y+ P E +++ 




Sbjct: 


1155 


EETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 1200 




Score 


- 421 


(63.2 bits), Expect = 1.8e-35, P = 1.8e-35 




Identities = 86/418 (20%), Positives - 219/418 (52%) 




Query: 


491 


HKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSR 


550 






H H E T P+E T + P+ +P+E + + E + P++ + +P+ 




Sbjct: 


376 


HYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTPTE 


435 


Query: 


551 


KNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHR 


610 






+ +P+E + +P+++ +P E + ++ +E + P++ + P++ ++P+E + 




Sbjct: 


436 


ETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTY 


495 


Query: 


611 


SPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSK 


670 






+ +E + +P+E +P++ + P+E + +P+E +P++ + P+E ++P++ 




Sbjct: 


496 


ASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTE 


555 


Query: 


671 


RSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHR 


730 






+ +PA + P+E + ++P+E + ++P+E ++P E + +P+E + +P+E 




Sbjct: 


556 


ETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPY 


615 


Query: 


731 


SPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFE 


790 






P+E ++P+E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E 




Sbjct: 


616 


EPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTE 


675 


Query: 


791 


RS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQG 


849 




+ + E +++P+E++ +P E + +P+E + + +T ++P+E + +P+ 








971 





WO 01/12659 



PCT/IBOO/01496 



Sbjct: 676 ETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMY 735 

RTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E + E T ++P+E P+ +P+E + P E Y+ P E +++ 

APIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 792 

(63.0 bits), Expect «* 2.3e-35, P = 2.3e-35 
- 82/393 (20%), Positives - 206/393 (52%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E TP P+E T + P+ +P+E + + +E ++P++ + +P+ + P+E + 

EETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETT 1030 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 62 1 

+P++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + + P+ 
YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 1090 

ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E +P++ + P+E + +P+E P++ + P+E ++P++ + +P + 

EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 1150 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + P+ + +P+E +P+E ++P+ 
YGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPT 1210 

EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 
E++ ++P+E + + P+E ++P E + + E + +P+E ++ E + + E 
EETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETM 1270 

HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 
++P +++ P E + +P+E ++ + +T ++P+E + P+G +E + + 

YAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPT 1330 

ERTRHSPSEMRPGRP SGRNHCSPSE 885 

E T ++P E P P S C+ E 

EETTYAPMEETPYEPAEESTSTVSTEKPCNTEE 1363 

(62.9 bits), Expect = 3.0e-35, P «= 3.0e-35 
= 83/411 (20%), Positives = 215/411 (52%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E T P+E T + P+ +P+E + E ++P++ + +P+ + +P E + 

EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 1006 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++ +E + +P+E + +P+ 
YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 1066 

ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E P++ + P+E + +P+E +P++ + P+E ++P++ + P + 

EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 1126 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P E + P+E + +P+E + +P+E +P+E + P+ 
YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1186 

EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

++ ++P+E + ++P+E ++P E + ++ E + P+E ++ E + + E + 
GETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETT 124 6 

HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 
++P+E++ +P E + +P+E ++ +T + P+E + +P+ +E + + 

YAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPT 1306 



Sbjct: 


676 


Query: 


850 


Sbjct: 


736 


Score 


- 420 


Identities ■ 


Query: 


502 


Sbjct: 


971 


Query: 


562 


Sbjct: 


1031 


Query : 


622 


Sbjct: 


1091 


Query: 


682 


Sbjct: 


1151 


Query: 


742 


Sbjct: 


1211 


Query: 


801 


Sbjct: 


1271 


Query: 


861 


Sbjct: 


1331 


Score 


= 419 


Identities = 


Query: 


502 


Sbjct: 


947 


Query: 


562 


Sbjct: 


1007 


Query: 


622 


Sbjct: 


1067 


Query: 


682 


Sbjct: 


1127 


Query: 


742 


Sbjct: 


1187 


Query: 


801 


Sbjct: 


1247 


Query: 


861 


Sbjct: 


1307 


Score 


= 415 


Identities 1 


Query: 


473 


Sbjct: 


878 


Query: 


532 


Sbjct: 


937 


Query: 


592 



E T + P+ 



P+ 



+P+E + +P++E 



P E + ++S + 



(62.3 bits), Expect = 8.0e-35, P ■ 8.0e-35 
84/423 (19%), Positives « 218/423 (51%) 



P + T + 



K+ T+ 



P+E T + P+ 



P+E + 



++P++ + +P+ + +P+E + +P++ 



P E + ++ +E 



++P++ 



P + 



++P+E + +P+E + 



P+E 



+P++ + 



P+E + + +E 
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Sbjct: 997 TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 1056 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

P++ + P+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 
Sbjct: 1057 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 1116 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 
Sbjct: 1117 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 117 6 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 
Sbjct: 1177 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 1236 

Query: 831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 

T + P+E + +P+ +E + + E T ++P + P+ +P+E + + 

Sbjct: 1237 TTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYA 1296 

Query: 891 PLKE 894 
P +E 

Sbjct: 1297 PTEE 1300 

Score - 403 (60.5 bits), Expect - 1.6e-33, P = 1.6e-33 
Identities « 84/394 (21%), Positives = 213/394 (54%) 

Query: 501 RERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERS 560 

' RE T PSE T + P +P+E+ +E + + ++ +P++ ++P+ER 

Sbjct: 319 REETTAAPSEDTTYAPREVTPYAPTEKPY--DVEETTYVTEESTY-APTKSETNAPTERM 375 

Query: 561 WRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSP 620 

+ ++ C E + ++ +E ++P++ + P++ ++P+E + P+E + +P 
Sbjct: 376 HYAHIEKP-CDT-EVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTP 433 

Query: 621 SERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRS 680 

+E +P++ + P+E++ +P+E +P++ + P+E ++P+K + +P + 
Sbjct: 434 TEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEET 493 

Query: 681 HRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSP 740 

+ +E + ++P+E + ++P+E + P+E + +P+E + +P+E +P+E ++P 
Sbjct: 494 TYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAP 553 

Query: 741 SEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISER 799 

+E++ ++P+E + + P+E ++P E+++E++PE++ E++ E 
Sbjct: 554 TEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEET 613 

Query: 800 SHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSS 859 

+ P+E++ +P E + +P+E ++S+ +T ++P+E + +P+ +E + + 

Sbjct: 614 PYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAP 673 

Query: 860 CERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P+E + +P +E 

Sbjct: 674 TEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 708 

Score - 398 (59.7 bits), Expect = 5.5e-33, P = 5.5e-33 
Identities = 84/402 (20%), Positives = 209/402 (51%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 992 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 1050 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 1051 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 1110 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
Sbjct: 1111 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 1170 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 1171 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 1230 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + P+E +P+E ++P+E++ ++P+E + ++P + + P E + ++ 
Sbjct: 1231 YAPT EETTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPT 1290 

Query: 774 ERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCH 833 

E + +P+E + E E ++ P+ ++ +P E + +P+E ++ +T + 

Sbjct: 1291 EATTYAPTEETPYAPTE ETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPY 1343 

Query: 834 SPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 
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P+E S+S+ T E + +ET PS+ P+ 
Sbjct: 1344 EPAEESTSTVSTEKPCNTEEFTDEPTDEPT-DEPSDEPTDEPT 1385 

Score = 368 (55.2 bits), Expect = 9.5e-30, P « 9.5e-30 
Identities « 79/386 (20%), Positives » 211/386 (54%) 

Query: 524 PSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSER 583 

PS+ ++ + E + P + + +PS +P E + +P+++ + E + + ++E 

Sbjct: 303 PSDETEAPT-EGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY — DVEETTY-VTEE 358 

Query: 584 GLHSPSQRSHRGPSQRRHHSPSER SHRSPSERSHRSPSERRHRSPSQRSHRGPS 637 

++P++ P++R H++ E+ + +P+E + +P+E +P++ + P+ 

Sbjct: 359 STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

Query: 638 ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSH 697 

E + P+E +P++ + P+E ++P++++ +P + +P+E + + P+E + 
Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 698 HSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPS 757 

++P++ ++P+E + + +E + +P+E +P+E + P+E++ ++P+E + ++P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 758 ERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSR 816 

E ++P E + ++ E + +P+E + E + + E +++P+E++ +P+E + 
Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 817 CSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 87 6 

+P+E ++ + +T + P+E + +P+ +E + +S E T ++P+E P+ 

Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 877 GRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

P+E + +P +E Y+ P E +++ 
Sbjct: 659 EETPYEPTEETTYAPTEE-TTYA-PTEETTYA 688 

* Score - 337 (50.6 bits), Expect « 2.1e-26, P = 2.1e-26 
Identities = 66/328 (20%), Positives « 170/328 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + +P+ + +P+E + 

Sbjct: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

P++ +P E + ++ +E +++P + + GP++ ++P+E + +P+E + +P+ 
Sbjct: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E P+ + P+E + +P+E +P++ + P+E + P++ + +P + 

Sbjct: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + +P + + P+E +P+E ++P+ 
Sbjct: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRIS 797 

E++ ++P+E + + P+ ++P E + ++ E + +P E + E S +S 
Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKP 1358 

Query: 798 ERSHSPSEKSHLSPLERSRCSPSE 821 

E + P+++ P + P++ 
Sbjct: 1359 CNTEEFTDEPTDEPTDEPSDEPTDEPTD 1386 

Score = 333 (50.0 bits), Expect = 5.7e-26, P = 5.7e-26 
Identities - 63/320 (19%), Positives - 166/320 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + P+ + +P+E + 

Sbjct: 1075 EETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1134 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++P+E + P+ + +P+ 
Sbjct: 1135 YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPT 1194 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHS PS KRSHRS PARRS H 681 

E +P++ + P+E + +P+E P++ + P+E + P++ + +P + 

Sbjct: 1195 EETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETT 1254 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P + + P+E + +P+E + +P+E +P+E + P+ 
Sbjct: 1255 YAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1314 

Query: 7 42 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRS FERSHRRISERSH 801 
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++ ++P+E + ++P+E ++P+E + + ES+S+ +E+ E + 

Sbjct: 1315 GETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKPCNTEEFTDEPTDEPTD 1374 



Query: 


802 


SPSEKSHLSPLERSRCSPSE 821 








PS++ P + P++ 




Sbjct: 


1375 


EPSDEPTDEPTDEPTDLPTD 1394 




Score 


= 303 


(45.5 bits), Expect = 9.6e-23, P = 9.6e-23 




Identities - 70/322 (21%), Positives - 170/322 (52%) 




Query: 


584 


GLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCS 


643 






G + PS + P++ + P E + +PSE + +P E +P++ + + E ++ + 




Sbjct: 


299 


GGYEPSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY-DVEETTYVT 


356 


Query: 


644 


PSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSER 


703 






E +P++ P+ER H++ ++ + + +P+E + ++P+E + ++P+E 




Sbjct: 


357 


--EESTYAPTKSETNAPTERMHYAHIEKPCDTEV — TMYAPTEETTYAPTEETTYAPTEE 


412 


Query: 


704 


RHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHS 


763 






++P+E + P+E + +P+E +P+E ++P+EK+ ++P+E + ++P+E + 




Sbjct: 


413 


TTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYE 

AAA r"l t 1 W 1_« A I iU4 A AJ4-J A A A* t 1 UUl A A flAT A £jE* A i X t X J-J A\ A A A *» k A uiJ A A A fit A w W A *7 A *J 


472 


Query: 


764 


PLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSER 


822 






P E + ++ + + +P+E ++ S E + + E +++P+E++ P E + +P+E 




Sbjct: 


473 


PTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 

LT 1 Lfu X A X <VXT A AAd A A inr 1 DDI A A nw A wAj A A Inr 1 bui A A ADCd iriuriubl A A A r»ri 


532 


Query: 


823 


RGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCS 


882 






++ + +T ++P+E + +P+ +E + E T ++P+E P+ + 




Sbjct: 


533 


TTYAPTFFTTYAPTFFTTYAPTFFTTYAPAFFTPYFPTFFTTYAPTFFTTYAPTFFTMYA 


592 


Query: 


883 


PSERSRRSPLKEGLKYSFPGERP 905 








P E + +P +E Y+ E P 




Sbjct: 


593 


PIEETTYAPTEE-TTYAPAEETP 614 




Score 


- 151 


(22.7 bits), Expect = 2.0e-06, P = 2.0e-06 




Identities = 


= 45/198 (22%), Positives = 103/198 (52%) 




Query: 


716 


PSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLER 


775 






PS+ + +P+E P E +PSE + ++P E + ++P+E+ +E + + + E 




Sbjct: 


303 


PSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPYD— VEETTY-VTEE 


358 


Query: 


776 


SHRSPSERRSHRSFERSHRRISERS HSPSEKSHLSPLERSRCSPSERRGHSSS 


828 






S +P++ ++ ER H E+ ++P+E++ +P E + +P+E ++ + 




Sbjct: 


359 


STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 


418 


Query: 


829 


GKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSR 


888 






+T + P+E + +P+ +E + + E+T ++P+E P+ P+E + 




Sbjct: 


419 


EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 


478 


Query: 


889 


RSPLKEGLKYSFPGERPSHSLSRD 912 








+P KE Y+ P E +++ + + 




Sbjct: 


479 


YAPTKE-TTYA-PTEETTYASTEE 500 





Pedant information for DKFZphtes3_8gll, frame 2 



Report for DKFZphtes3_8gll . 2 



[LENGTH] 954 

[MWJ 110063.05 

[pi) 11.40 

[PROSITE] ATP_GTP_A 1 

[KW] Irregular 

[KW] LOW_COMPLEXITY 27.67 % 



SEQ ESSLSIFYDREDLVPKEESEDSQSDSQTRISESQHSLKPNYLSQAKTDFSEQFQLLEDLQ 

SEG xxxxxxxxxxx 

PRD ccceeeccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ LKIAAKLLRSQIPPDVPPPLASGLVLKYPICLQCGRCSGLNCHHKLQTTSGPYLLIYPQL 

SEG 

PRD hhhhhhhhhhcccccccccccceeeeecceeecccccccccccccccccccceeeehhhh 

SEQ HLVRTPEGHGEVRLHLGFRLRIGKRSQISKYRERDRPVIRRSPISPSQRKAKIYTQASKS 

SEG 

PRD hcccccccccceeecccceeeccccccccccccccceeeeeccccccchhhhhhhccccc 

SEQ PT S T I DLQSG P S QS P APVQ V Y I RRGQ RS R PDLV EKT KTRAPGHYEFTQVHNLPESDSEST 
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SEG 

PRD ccccccccccccccccceeeeeeeccccccchhhhhhcccccceeeeeecccccccccch 

SEQ QNEKRAKVRTKKTSDSKYPMKRITKRLRKHRKFYTNSRTTIESPSRELAAHLRRKRIGAT 

SEG 

PRD hhhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhcc 

SEQ QTSTASLKRQPKKPSQPKFMQLLFQSLKRAFQTAHRVIASVGRKPVDGTRPDNLWASKNY 

SEG 

PRD ccchhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ YPKQN ARDYCLPSSI KRDKRSADKLT PAGSTI KQEDI LWGGTVQCRSAQQPRRA YS FQPR 

SEG 

PRD cccccccccccccccccccccccccccccccccccceeeccccccccccccccccccccc 

SEQ PLRLPKPTDSQSGIAFQTASVGQPLRTVQKDSSSRSKKNFYRNETSSQESKNLSTPGTRV 

SEG 

PRD ccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccccccee 

SEQ QARGRILPGSPVKRTWHRHLKDKLTHKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPS 

SEG xxxxx 

PRD eeecccccccccccccccccccccccccccccceeeeccccccccccccccccccccccc 

SEQ ERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGL 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD chhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ ERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERSRHSLLERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ SSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSE 

SEG xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ RSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTTLLGTTHKNPKAGQVWRPEATR 

SEG 

PRD ccccccccccceeecccccccccccccccccccccccccccccccccccccccc 

Prosite for DKFZphtes3_8gll . 2 

PS00017 839->847 ATP_GTP_A PDOC00017 

(No Pfam data available for DKFZphtes3_8gll . 2) 
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DKF2phtes3_8g5 



group: testes derived 

DKFZphtes3__8g5 encodes a novel 544 amino acid protein nearly identical to human KIAA087 
protein. 

The novel protein is a new splice variant of KIAA087. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



KIAA087, alternative spliced 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2762 bp 

No poly A stretch found, no polyadenylation signal found 



1 CCGACATCGG CCGTGTCTCC AGCACCTGCC GGCGGCTGCG CGAGCTGTGC 
51 CAGAGCAGCG GGAAGGTGTG GAAGGAGCAG TTCCGGGTGA GGTGACCTTC 
101 CCTTATGAAA CACTACAGCC CCACCGACTA CGTCAATTGG TTGGAAGAGT 
151 ATAAAGTTCG GCAAAAAGCT GGGTTAGAAG CGCGGAAGAT TGTAGCCTCG 
201 TTCTCAAAGA GGTTCTTTTC AGAGCACGTT CCTTGTAATG GCTTCAGTGA 
251 CATTGAGAAC CTTGAAGGAC CAGAGATTTT TTTTGAGGAT GAACTGGTGT 
301 GTATCCTAAA TATGGAAGGA AGAAAAGCTT TGACCTGGAA ATACTACGCA 
351 AAAAAAATTC TTTACTACCT GCGGCAACAG AAGATCTTAA ATAATCTTAA 
401 GGCCTTTCTT CAGCAGCCAG ATGACTATGA GTCGTATCTT GAAGGTGCTG 
451 TATATATTGA CCAGTACTGC AATCCTCTCT CCGACATCAG CCTCAAAGAC 
501 ATCCAGGCCC AAATTGACAG CATCGTGGAG CTTGTTTGCA AAACCCTTCG 
551 GGGCATAAAC AGTCGCCACC CCAGCTTGGC CTTCAAGGCA GGTGAATCAT 
601 CCATGATAAT GGAAATAGAA CTCCAGAGCC AGGTGCTGGA TGCCATGAAC 
651 TATGTCCTTT ACGACCAACT GAAGTTCAAG GGGAATCGAA TGGATTACTA 
701 TAATGCCCTC AACTTATATA TGCATCAGGT TTTGATTCGC AGAACAGGAA 
751 TCCCAATCAG CATGTCTCTG CTCTATTTGA CAATTGCTCG GCAGTTGGGA 
801 GTCCCACTGG AGCCTGTCAA CTTCCCAAGT CACTTCTTAT TAAGGTGGTG 
851 CCAAGGCGCA GAAGGGGCGA CCCTGGACAT CTTTGACTAC ATCTACATAG 
901 ATGCTTTTGG GAAAGGCAAG CAGCTGACAG TGAAAGAATG CGAGTACTTG 
951 ATCGGCCAGC ACGTGACTGC AGCACTGTAT GGGGTGGTCA ATGTCAAGAA 
1001 GGTGTTACAG AGAATGGTGG GAAACCTGTT AAGCCTGGGG AAGCGGGAAG 
1051 GCATCGACCA GTCATACCAG CTCCTGAGAG ACTCGCTGGA TCTCTATCTG 
1101 GCAATGTACC CGGACCAGGT GCAGCTTCTC CTCCTCCAAG CCAGGCTTTA 
1151 CTTCCACCTG GGAATCTGGC CAGAGAAGTC TTTCTGTCTT GTTTTGAAGG 
1201 TGCTTGACAT CCTCCAGCAC ATCCAAACCC TAGACCCGGG GCAGCACGGG 
1251 GCGGTGGGCT ACCTGGTGCA GCACACTCTA GAGCACATTG AGCGCAAAAA 
1301 GGAGGAGGTG GGCGTAGAGG TGAAGCTGCG CTCCGATGAG AAGCACAGAG 
1351 ATGTCTGCTA CTCCATCGGG CTCATTATGA AGCATAAGAG GTATGGCTAT 
1401 AACTGTGTGA TCTACGGCTG GGACCCCACC TGCATGATGG GACACGAGTG 
1451 GATCCGGAAC ATGAACGTCC ACAGCCTGCC GCACGGCCAC CACCAGCCTT 
1501 TCTATAACGT GCTGGTGGAG GACGGCTCCT GTCGATACGC AGCCCAAGAA 
1551 AACTTGGAAT ATAACGTGGA GCCTCAAGAA ATCTCACACC CTGACGTGGG 
1601 ACGCTATTTC TCAGAGTTTA CTGGCACTCA CTACATCCCA AACGCAGAGC 
1651 TGGAGATCCG GTATCCAGAA GATCTGGAGT TTGTCTATGA AACGGTGCAG 
1701 AATATTTACA GTGCAAAGAA AGAGAACATA GATGAGTAAA GTCTAGAGAG 
1751 GACATTGCAC CTTTGCTGCT GCTGCTATCT TCCAAGAGAA CGGGACTCCG 
1801 GAAGAAGACG TCTCCACGGA GCCCTCGGGA CCTGCTGCAC CAGGAAAGCC 
1851 ACTCCACCAG TAGTGCTGGT TGCCTCCTAC TAAGTTTAAA TACCGTGTGC 
1901 TCTTCCCCAG CTGCAAAGAC AATGTTGCTC TCCGCCTACA CTAGTGAATT 
1951 AATCTGAAAG GCACTGTGTC AGTGGCATGG CTTGTATGCT TGTCCTGTGG 
2001 TGACAGTTTG TGACATTCTG TCTTCATGAG GTCTCACAGT CGACGCTCCT 
2051 GTAATCATTC TTTGTATTCA CTCCATTCCC CTGTCTGTCT GCATTTGTCT 
2101 CAGAACATTT CCTTGGCTGG ACAGATGGGG TTATGCATTT GCAATAATTT 
2151 CCTTCTGATT TCTCTGTGGA ACGTGTTCGG TCCCGAGTGA GGACTGTGTG 
2201 TCTTTTTACC CTGAAGTTAG TTGCATATTC AGAGGTAAAG TTGTGTGCTA 
2251 TCTTGGCAGC ATCTTAGAGA TGGAGACATT AACAAGCTAA TGGTAATTAG 
2301 AATCATTTGA ATTTATTTTT TTCTAATATG TGAAACACAG ATTTCAAGTG 
2351 TTTTATCTTT TTTTTTTTTA AATTTAAATG GGAATATAAC ACAGTTTTCC 
2401 CTTCCATATT CCTCTCTTGA GTTTATGCAC ATCTCTATAA ATCATTAGTT 
24 51 TTCTATTTTA TTACATAAAA TTCTTTTAGA AAATGCAAAT AGTGAACTTT 
2501 GTGAATGGAT TTTTCCATAC TCATCTACAA TTCCTCCATT TTAAATGACT 
2551 ACTTTTATTT TTTAATTTAA AAAATCTACT TCAGTATCAT GAGTAGGTCT 
2601 TACATCAGTG ATGGGTTCTT TTTGTAGTGA GACATACAAA TCTGATGTTA 
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2651 ATGTTTGCTC TTAGAAGTCA TACTCCATGG TCTTCAAAGA CCAAAAAATG 
2701 AGGTTTTGCT TTTGTAATCA GGAAAAAAAA AATTAATGAA CCTTAAAAAA 
2751 AAAAAAAAAA GG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 105 bp to 1736 bp; peptide length: 544 
Category: known protein 
Classification: unclassified 



1 MKHYSPTDYV NWLEEYKVRQ KAGLEARKIV ASFSKRFFSE HVPCNGFSDI 

51 ENLEGPEIFF EDELVCILNM EGRKALTWKY YAKKILYYLR QQKILNNLKA 

101 FLQQPDDYES YLEGAVYIDQ YCNPLSDISL KDIQAQIDSI VELVCKTLRG 

151 INSRHPSLAF KAGESSMIME IELQSQVLDA MNYVLYDQLK FKGNRMDYYN 

201 ALNLYMHQVL IRRTGIPISM SLLYLTIARQ LGVPLEPVNF PSHFLLRWCQ 

251 GAEGATLDIF DYIYIDAFGK GKQLTVKECE YLIGQHVTAA LYGWNVKKV 

301 LQRMVGNLLS LGKREGI DQS YQLLRDSLDL YLAMYPDQVQ LLLLQARLYF 

351 HLGIWPEKSF CLVLKVLDIL QHIQTLDPGQ HGAVGYLVQH TLEHIERKKE 

401 EVGVEVKLRS DEKHRDVCYS IGLIMKHKRY GYNCVIYGWD PTCMMGHEWI 

451 RNMNVHSLPH GHHQPFYNVL VEDGSCRYAA QENLEYNVEP QEISHPDVGR 

501 YFSEFTGTHY IPNAELEIRY PEDLEFVYET VQNIYSAKKE NIDE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8g5, frame 3 

TREMBLNEW:AB020682_1 gene: "KIAA0675"; product: "KIAA0875 protein"; 
Homo sapiens mRNA for KIAA0875 protein, partial cds., N = 1, Score « 
2832, P = 5.5e-295 

>TREMBLNEW:AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo 
sapiens mRNA for KIAA0875 protein, partial cds. 
Length =• 621 

HSPs: 

Score = 2832 (424.9 bits), Expect = 5.5e-295, P = 5.5e-295 
Identities = 537/544 (98%), Positives - 537/544 (98%) 

MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDI ENLEGPEIFF 6 
MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDI ENLEGPEIFF 



EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 



YCNPLSDISLKDIQAQIDS I VELVCKTLRG I NSRHPSIAFKAGESSMIME IELQSQVLDA 



MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 



PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGWNVKKV 



LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK 



Query: 


1 


Sbjct: 


85 


Query: 


61 


Sbjct: 


145 


Query: 


121 


Sbjct: 


205 


Query: 


1B1 


Sbjct: 


265 


Query: 


241 


Sbjct: 


325 


Query: 


301 
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Sbjct : 


385 


LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK-- 


442 


Query: 


361 


CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 


420 






VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 




Sbjct: 


443 


VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 


497 


Query: 


421 


IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 


480 






IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRKMNVHSLPHGHHQPFYNVLVEDGSCRYAA 




Sbjct: 


498 


IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 


557 


Query: 


481 


QENLEYNVEPQEI SHPDVGRYFSEFTGTHYI PNAELEI RYPEDLEFVYETVQNI YSAKKE 


540 






QENLEYNVEPQEI SHPDVGRYFSEFTGTH YI PNAELEI RYPEDLEFVYETVQNI YSAKKE 




Sbjct: 


558 


QENLEYNVEPQEI SHPDVGRYFSEFTGTHYI PNAELEI RYPEDLEFVYETVQNI YSAKKE 


617 


Query: 


541 


NIDE 544 








NIDE 




Sbjct: 


618 


NIDE 621 








Pedant information for DKFZphtes3_8g5, frame 3 








Report for DKFZphtes3_8g5 .3 




[LENGTH] 




544 




[MW] 




63307.22 





[pi] 5.82 

[HOMOLJ TREMBL:AB020682_1 gene: "KIAA0B75"; product: "KIAA0875 protein"; Homo sapiens 

mRNA for KIAA0875 protein, partial cds. 0.0 
[KWJ Alpha_Beta 
[KW] LOW_COMPLEXITY 1.84 % 



SEQ MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

SEG 

PRD cccccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccccccccccccccceee 

SEQ EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

SEG 

PRD eeeeeeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeecceeeeeee 

SEQ YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhcccccccccceeeecccchhhhhhhhhhhhhhh 

SEQ MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 

SEG 

PRD hhhhhccccccccccchhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccccc 

SEQ PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 

SEG 

PRD cceeeeeeccccccceeeeeeeeeeeccccceeeeeehhhhhhhhhhhhhhhhhhhhhhh 

SEQ LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYFDQVQLLLLQARLYFHLGIWPEKSF 

SEG 

PRD hhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhhhhhhhhhhhhcccccceee 

SEQ CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 

SEG xxxxxxxxxx 

PRD ehhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhheeeeecccccceeeecc 

SEQ ■ IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 

SEG 

PRD cccchhhhhhhceeeeecccccccchhhhhhhhhhhccccccccccceeeeecccceeee 

SEQ QENLEYNVEPQEI SHPDVGRYFSEFTGTHY I PNAELEI RYPEDLEFVYETVQNI YSAKKE 

SEG : 

PRD hhhhhhhhcccccccccceeeeccccccccccchhhhhhccchhhhhhhhhhhhhccccc 

SEQ NIDE 

SEG 

PRD CCCC 



{No Prosite data available for DKFZphtes3_8g5 . 3) 
(No Pfam data available for DKFZphtes3_8g5 . 3) 



979 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_8ml0 



group: nucleic acid management 

DKFZphtes3_8ml0 encodes a novel 221 amino acid protein with strong similarity to 
polyadenylate-binding proteins. 

The poly (A) -binding protein (PABP) binds to the messenger (mRNA) 3 1 -poly (A) tail found on most 
eukaryotic mRNAs and together with the poly (A) tail has been implicated in governing the 
stability and the translation of mRNA. 

The new protein can find application in modulation of mRNA translation and 
processing/stability. 

strong similarity to polyadenylate-binding protein 

frame shift at Bp 707-710 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2107 bp 

Poly A stretch at pos. 2052, polyadenylation signal at pos. 2033 



1 CGGAAAGGTC GCGGCTTGTG TGCCTGCGGG CAGCCGTGCC GAGAATGAAC 
51 CCCAGCACCC CCAGCTACCC AACGGCCTCG CTCTACGTGG GGGACCTCCA 
101 CCCCGACGTG ACTGAGGCGA TGCTCTACGA GAAGTTCAGC CCGGCAGGGC 
151 CCATCCTCTC CATCCGGATC TGCAGGGACT TGATCACCAG CGGCTCCTCC 
201 AACTACGCGT ATGTGAACTT CCAGCATACG AAGGACGCGG AGCATGCTCT 
251 GGACACCATG AATTTTGATG TTATAAAGGG CAAGCCAGTA CGCATCATGT 
301 GGTCTCAGCG TGATCCATCA CTTCGAAAAA GTGGAGTGGG CAACATATTC 
351 GTTAAAAATC TGGATAAGTC CATTAATAAT AAAGCACTGT ATGATACAGT 
401 TTCTGCTTTT GGTAACATCC TTTCGTGTAA CGTGGTTTGT GATGAAAATG 
4 51 GTTCCAAGGG TTATGGATTT GTACACTTTG AGACACACGA AGCAGCTGAA 
501 AGAGCTATTA AAAAAATGAA CGGAATGCTC CTAAATGGTC GCAAAGTATT 
551 TGTTGGACAA TTTAAGTCTC GTAAAGAACG AGAAGCTGAA CTTGGAGCTA 
601 GGGCAAAAGA GTTCCCCAAT GTTTACATCA AGAATTTTGG AGAAGACATG 
651 GATGATGAGC GCCTTAAGGA TCTCTTTGGC AAGTTCGGGC CCGCCTTAAG 
701 TGTGAATTAA TGACCGATGA AAGTGGAAAA TCCAAAGGAT TTGGATTTGT 
751 AAGCTTTGAA AGGCATGAAG ATGCACAGAA AGCTGTAGAT GAGATGAATG 
801 GAAAGGAGCT CAATGGAAAA CAAATTTACG TTGGTCGAGC TCAGAAAAAA 
851 GTGGAACGGC AGACGGAACT TAAGCGCACA TTTGAACAGA TGAAGCAAGA 
901 TAGGATCACC AGATACCAGG TTGTTAATCT TTATGTGAAA AATCTTGATG 
951 ATGGTATTGA TGATGAACGT CTCCGGAAAG CGTTTTCTCC ATTTGGTACA 
1001 ATCACTAGTG CAAAGGTTAT GATGGAAGGT GGTCGCAGCA AAGGGTTTGG 
1051 TTTTGTATGT TTCTCCTCCC CAGAAGAAGC CACTAAAGCA GTTACAGAAA 
1101 TGAACGGTAG AATTGTGGCC ACAAAGCCAT TGTATGTAGC TTTAGCTCAG 
1151 CGCAAAGAAG AGCGCCAGGC TTACCTCACT AACGAGTATA TGCAGAGAAT 
1201 GGCAAGTGTA CGAGCTGTGC CCAACCAGCG AGCACCTCCT TCAGGTTACT 
1251 TCATGACAGC TGTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
1301 AGCCAAATTG CTCGACTAAG ACCAAGTCCT CGCTGGACTG CTCAGGGTGC 
1351 CAGACCTCAT CCATTCCAAA ATAAGCCCAG TGCTATCCGC CCAGGTGCTC 
1401 CTAGAGTACC ATTTAGTACT ATGAGACCAG CTTCTTCACA GGTTCCACGA 
1451 GTCATGTCAA CGCAGCGTGT TGCTAACACA TCAACACAGA CAGTGGGTCC 
1501 ACGTCCTGCA GCTGCTGCTG CTGCTGCAGC TACCCCTGCT GTGCGCACGG 
1551 TTCCACGGTA TAAATATGCT GCGGGAGTTC GCAATCCTCA GCAACATCGT 
1601 AATGCACAGC CACAAGTTAC AATGCAACAG CTTGCTGTTC ATGTACAAGG 
1651 TCAGGAAACT TTGACTGCCT CCAGGTTGGC ATCTGCCCCT CCTCAAAAGC 
1701 AAAAGCAAAT GTTAGGTGAA CGGCTCTTTC CTCTTATTCA AGCCATGCAC 
1751 CCTACTCTTG CTGGGAAAAT CACTGGCATG TTGTTGGAGA TTGATAATTC 
1601 AGAACTTCTT TATATGCTCG AGTCTCCAGA GTCACTCCGT TCTAAGGTTG 
1851 ATGAAGCTGT AGCTGTACTA CAAGCCCACC AAGCTAAAGA GGCTACCCAG 
1901 AAAGCAGTTA ACAGTGCTAC CGGTGTTCCA ACTGTTTAAA ATTGATCAGA 
1951 GACCACGAAA AGAAATTTGT GCTTCACCGA AGAAAAATAT CTAAACATCG 
2001 AGAAACTATG GGAAAAAAAA TTGCAAAATC TAAAATAAAA AATGCAAAAT 
2051 CTAAAATAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2101 AAAAAGG 



BLAST Results 



Entry HSPOLYAB from database EMBL: 

Human mRNA for polyA binding protein 

Score - 5420, P = 0.0e+00, identities = 1162/1243 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 707 bp to 1936 bp; peptide length: 410 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (10-18) 
RNP 1 (112-120) 



1 LMTDESGKSK GFGFVSFERH EDAQKAVDEM NGKELNGKQI YVGRAQKKVE 
51 RQTELKRTFE QMKQDRITRY QVVNLYVKNL DDGIDDERLR KAFSPFGTIT 
101 SAKVMMEGGR SKGFGFVCFS SPEEATKAVT EMNGRIVATK PLYVALAQRK 
151 EERQAYLTNE YMQRMASVRA VPNQRAPPSG YFMTAVPQTQ NHAAYYPPSQ 
201 IARLRPSPRW TAQGARPHPF QNKPSAIRPG APRVPFSTMR PASSQVPRVM 
251 STQRVANTST QTVGPRPAAA AAAAAT PAVR TVPRYKYAAG VRNPQQHRNA 
301 QPQVTMQQLA VHVQGQETLT ASRLASAPPQ KQKQMLGERL FPLIQAMHPT 
351 LAGKITGMLL EIDNSELLYM LESPESLRSK VDEAVAVLQA HQAKEATQKA 
401 VNSATGVPTV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8ml0, frame 2 

PIR:DNHUPA polyadenylate-binding protein - human, N = 1, Score = 1931, 
P = 1.7e-199 

PIR: 148718 poly(A) binding protein - mouse, N » 1, Score = 1928, P « 
3.6e-199 



>PIR:DNHUPA polyadenylate-binding protein - human 
Length * 633 

HSPs: 



Score = 1931 (289.7 bits), Expect = 1.76-199, P = 1.7e-199 
Identities « 384/415 (92%), Positives = 394/415 (94%) 



Query: 


1 


LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 


60 






+MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVE RQTELKR FE 




Sbjct: 


219 


VMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFE 


278 


Query: 


61 


QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFS 


120 






QMKQDRITRYQ VNLYVKNLDDGIDDERLRK FSPFGTITSAKVMMEGGRSKGFGFVCFS 




Sbjct : 


279 


QMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFS 


338 


Query: 


121 


SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPN Q 


174 






SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQA+LTN+YMQRMASVRAVPN Q 




Sbjct : 


339 


SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQ 


398 


Query : 


175 


RAPPSGYFMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRV 


234 






APPSGYFM A+PQTQN AAYYPPSQ+A+LRPSPRWTAQGARPHPFQN P AIRP APR 




Sbjct: 


399 


PAPPSGYFMAAIPQTQNRAAYYPPSQVAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRP 


458 


Query: 


235 


PFSTMRPASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNP 


294 






PFSTMRPASSQVPRVMSTQRVANTSTQT+GPRPAAAAAAA TPAVRTVP+YKYAAGVRNP 




Sbjct : 


459 


PFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAA-TPAVRTVPQYKYAAGVRNP 


517 


Query : 


295 


QQHRNAQPQVTMQQLA VHVQGQETLT AS RLASAPPQKQKQMLGERL FPL I QAMHPTLAGK 


354 






QQH NAQPQVTMQQ AVHVQGQE LTAS LASAPPQ+QKQMLGERL FPL I QAMHPTLAGK 




Sbjct: 


518 


QQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPL I QAMHPTLAGK 


577 


Query: 


355 


ITGMLLE I DNS ELLYMLES PES LRSKVDEAVAVLQAHQAKEATQKA VNSATGVPTV 410 








ITGMLLEI DNSELL+MLESPESLRSKVDEAVAVLQAHQAKEA QKAVNSATGVPTV 




Sbjct: 


578 


ITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV 633 




Score 


= 315 


(47.3 bits), Expect = 1.9e-27, P = 1.9e-27 
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Identities = 71/163 (43%), Positives «' 102/163 (62%) 



Query: 


1 


LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 


60 






++ DE+G SKG+GFV FE E A++A+++MNG LN ++++VGR + + ER+ EL + 




Sbjct: 


130 


WCDENG-SKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAK 


188 


Query: 


61 


QMKQDRITRYQWNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMM-EGGRSKGFGFVCF 


119 






+ N+Y+KN + +DDERL+ F P S KVM E G+SKGFGFV F 




Sbjct: 


189 


EF TNVYIKNFGEDMDDERLKDLFGP— ALSVKVMTDESGKSKGFGFVSF 


i J3 


Query: 


120 


SSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQ 163 






E+A KAV EMNG+ + K +YV AQ+K ERQ L ++ Q 




Sbjct: 


236 


ERHEDAQKAVDEMNGKELNGKQI Y VG RAQKKVE RQT EL KRK FEQ 279 




Score 


= 214 


(32.1 bits), Expect « 1.9e-14, P « 1.9e-14 




Identities - 50/150 (33%), Positives - 87/150 (58%) 




Query: 


8 


KSKG FGFVS FERHED AQKAVDEMNGKELNGKQI Y VG RAQKKVE RQT EL KRT FEQMKQDRI 


67 






+S G+ +V+F++ DA++A+D MN + GK + + +Q R L+++ 




Sbjct: 


50 


RSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQ RDPSLRKS 


96 


Query: 


68 


TRYQWNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATK 


127 




V N+++KNLD ID++ L FS FG I S KV+ + SKG+GFV F + E A + 




Sbjct: 


97 


GVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAER 


153 


Query: 


128 


AV T EMN GRI VAT K PL Y V AL AQRK EERQA YL 157 








A+ +MNG ++ + ++V + ++ER+A L 




Sbjct: 


154 


AIEKMNGMLLNDRKVFVGRFKSRKEREAEL 183 




Score 


= 120 


(18.0 bits), Expect = 4.8e-04, P - 4.8e-04 




Identities s 


= 30/99 (30%), Positives = 54/99 (54%) 




Query: 


70 


YQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVM — MEGGRSKGFGFVCFSSPEEATK 


127 






Y + +LYV +L + + L + FSP G I S +V M RS G+ +V F P +A + 




Sbjct: 


8 


Y PMAS L Y VG DL HP D VT EAML Y EKFS PAG P I LS I RVC R DM I T RRS LG Y A Y VN FQQ P AD AE R 


67 


Query: 


128 


AVTEMNGRI VATKPLYVALAQRKEE-RQAYLTNEYMQRM 165 








A+ MN ++ KP+ + +QR R++ + N +++ + 




Sbjct: 


68 


ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNL 106 








Peptide information for frame 3 





ORF from 45 bp to 707 bp; peptide length: 221 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP 1 (138-146) 



1 MNPSTPSYPT ASLYVGDLHP DVTEAMLYEK FSPAGPILSI RICRDLITSG 

51 SSNYAYVNFQ HTKDAEHALD TMNFDVIKGK PVRIMWSQRD PSLRKSGVGN 

101 IFVKNLDKSI NNKALYDTVS AFGNILSCNV VC DENGS KG Y GFVHFETHEA 

151 AERAIKKMNG MLLNGRKVFV GQFKSRKERE AELGARAKEF PNVYIKNFGE 

201 DMDDERLKDL FGKFGPALSV N 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8mlO, frame 3 

SWISSPROT : PAB INHUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1)., N = 1, Score = 1039, P » 5.7e-105 

PIR: 148718 poly (A) binding protein - mouse, N = 1, Score = 1031, P « 
4e-104 

PIR:DNHUPA polyadenylate-binding protein - human, N » 1, Score - 1009, 
P = 8.7e-102 



>SWISSPR0T:PAB1_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1) . 
Length = 636 

HSPs: 



982 



WO 01/12659 



PCT/IB00/01496 



Score = 1039 (155.9 bits), Expect « 5.7e-105, P - 5.7e-105 
Identities = 199/220 (90%), Positives => 205/220 (93%) 



Oupru : 
W uci J( • 


1 


MNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQ 


60 




MNPS PSYP AS L Y VG D LH PDVTE AML Y EK FS PAGPILSIR+C RD+ 1 T S YAYVNFQ 




Sbjct: 


1 


MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQ 


60 


Query: 


61 


HTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNLDKSINNKALYDTVS 


120 




DAE ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIF+KNLDKSI+NKALYDT S 




Sbjct: 


61 


QPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFS 


120 


Query: 


121 


AFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSRKERE 


180 




AFGNILSC VVCDENGSKGYGFVHFET EAAERAI +KMNGMLLN RKVFVG+FKSRKERE 




Sbjct: 


121 


AFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKERE 


180 


Query: 


181 


AELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 






AELGARAKEF NVYIKNFGEDMDDERLKDLFGKFGPALSV 




Sbjct: 


181 


AELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 




Score 


= 275 


(41.3 bits). Expect ~ 4.1e-23, P « 4.1e-23 




Identities = 71/233 (30%), Positives - 120/233 (51%) 




Query: 


2 


NPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 


61 




+PS ++++ +L + LY+ FS G ILS ++ D S + + Q 




Sbjct: 


90 


dpslrksgvgnifiknldksidnkalydtfsafgnilsckvvcdengskgygfvhfetqe 


149 


Query: 


62 


TKD-AEHALDTMNFDVIKGKPVRIMW-SQRDPSL— RKSGVGNIFVKNLDKSINNKALYD 


117 




+ A ++ M + K R +R+ L R N+++KN + ++++ L D 




Sbjct: 


150 


AAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKD 


209 


Query: 


118 


TVSAFGNILSCNVVCDENG-SKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSR 


176 




FG LS V+ DE+G SKG+GFV FE HE A++A+ +MNG LNG++++VG+ + + 




Sbjct: 


210 


LFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKK 


269 


Query: 


177 


KEREAELGARAKEFP NVYIKNFGEDMDDERLKDLFGKFGPALS 2 19 






ER+ EL + ++ N+Y+KN + +DDERL+ F FG S 




Sbjct: 


270 


VERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITS 322 




Score 


- 227 


(34.1 bits), Expect = 6.3e-18, P = 6.3e-18 




Identities = 


- 57/187 (30%), Positives = 101/187 (54%) 




Query: 


12 


SLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQHTKDAEHALDT 


71 




++Y+ + D+ + L + F GP LS+++ D + S + +V+F+ +DA+ A+D 




Sbjct: 


192 


NVYIKN FGEDMDDERLKDLFGKFGPALSVKVMTDE-SGKSKGFGFVSFERHEDAQKAVDE 


250 


Query: 


72 


MNFDVIKGKPVRIMWSQR DPSLRKSGVGNIFVKNLDKS INNKA 


114 






MN + GK + + +Q+ D R GV N++VKNLD I+++ 




Sbjct: 


251 


MNG KE LNGKQ I Y VG RAQK KV E RQT EL K RK FEQMKQ DR I T RY QG V - NL Y V KN L D DG I D DER 


309 


Query: 


115 


LYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK 


174 




L S FG I S V+ + SKG+GFV F + E A +A+ +MNG ++ + ++V + 




Sbjct: 


310 


LRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQ 


369 


Query: 


175 


SRKEREAEL 183 








++ER+A L 




Sbjct: 


370 


RKEERQAHL 378 




Score 


- 100 


(15.0 bits), Expect - 2.3e-02, P = 2.3e-02 




Identities = 26/99 (26%), Positives = 53/99 (53%) 




Query: 


8 


YPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSG-SSNYAYVNFQHTKDAE 


66 






Y +LYV +L + + L ++FSP G I S ++ ++ G S + +V F ++A 




Sbjct: 


291 


YQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKV MMEGGRSKGFGFVCFSSPEEAT 


347 


Query: 


67 


HALDTMNFDVI KGKPVRIMWSQRDPSLRKSGVGN I FVKNL 106 








A+ MN ++ KP+ + +QR R++ + N +++ + 




Sbjct: 


348 


KAVTEMNGRI VATKPLYVALAQRKEE-RQAHLTNQYMQRM 386 





Pedant information for DKFZphtes3_8ralO, frame 2 



Report for DKFZphtes3_8mlO .2 



(LENGTH) 409 

[MW] 45235.68 

[pi] 10.08 

[HOMOL] SWISSPROT : PAB INHUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 

1) (PABP 1) . 0.0 
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le- 



t FUNCAT] 
cerevisiae, 
[ FUNCAT ) 
[FUNCAT] 
[ FUNCAT J 
YERl65w] 
[FUNCAT) 
le-15 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
[FUNCAT] 
2e-05 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[BLOCKS] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
I PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW J 
[PIRKW] 
[PIRKW] 
[SUPFAMJ 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 
[KW] 



04.05.05 mrna processing {5' -end, 3* -end processing and mrna degradation) [S. 
YER165W] le-54 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-54 
30.10 nuclear organization [S. cerevisiae, YER165w] le-54 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

54 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w] 

11.01 stress response [S. cerevisiae, YGRl59c] le-12 
04.01.04 rrna processing [S. cerevisiae, YGRl59c) le-12 

04.99 other transcription activities [S. cerevisiae, YNL175C] 4e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 5e-0B 
03.19 recombination and dna repair [S. cerevisiae, YHR086w] 3e-07 
03.13 meiosis [S. cerevisiae, YHR086w] 3e-07 

04.05.03 mrna processing (splicing) [S. cerevisiae, YHR086w] 3e-07 
04.07 rna transport [S. cerevisiae, YOL123w HRPl - CF lb] 9e-07 
30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 3e-06 

99 unclassified proteins [S. cerevisiae, YGR250c] 8e-06 
06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432wJ 

08.01 nuclear transport [S. cerevisiae, YDR432w] 2e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
(S. cerevisiae, YFR023wJ 3e-05 

03.01 cell growth [S. cerevisiae, YBR212w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

dlsxl 4.34.7.1.3 Sex-lethal protein [(Drosophila melanogaster ) le-17 

nucleus 0.0 
duplication 0.0 
RNA binding 0.0 
nucleolus 2e-09 
tandem repeat 2e-09 
single-stranded DNA binding 3e-Q6 
DNA binding 5e-13 
phosphoprotein 6e-10 
ribosome 3e-08 
mitochondrion 3e-08 
alternative splicing 9e-ll 
chloroplast 2e-19 
transcription regulation 2e-07 
protein biosynthesis 3e-08 
nucleolin 6e-10 

glycine-rich RNA-binding protein 2e-07 

unassigned ribonucleoprotein repeat-containing proteins 2e-19 
polyadenylate-binding protein 0.0 
ribonucleoprotein repeat homology 0.0 
RNP_1 2 

■RNA recognition motif, (aka RRM, RBD, or RNP domain) 
Irregular 
3D 

LOW COMPLEXITY 5.62 % 



SEQ MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQ 

SEG 

Isxl- 

SEQ MKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSS 

SEG 

1 sxl- CEEEECCCTTTTHHHHHHHHTTTTCCCCCEEECTTTCTTTEEEECTTT 

SEQ PEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPNQRAPPSGY 

SEG 

1 sxl- HHHHHHHHHHHTTTCCCCCCCBCCBCC 

SEQ FMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRVPFSTMRP 

SEG 

Isxl- 

SEQ ASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNPQQHRNAQ 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

Isxl- 

SEQ PQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGKITGMLLE 

SEG 

Isxl- 

SEQ IDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 

SEG 

Isxl- 
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Prosite for DKFZphtes3_8ml0.2 

PS00030 9->17 RNP 1 PDOC00030 

PS00030 111->119 RNP~1 PDOC00030 



Pfam for DKFZphtes3_8ml0.2 

HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM *IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
+YV+NL+ +++E LR +FS+FG I+S+++M+ E GRS+GF+FV F + 
Query 74 LYVKNLDDGIDDERLRKAFSPFGTITSAKVMM — EGGRSKGFGFVCFSS 120 

HMM EEDAe kAI deMNGme FmGRr I RV* 

+E+A+KA+ EMNG+++ ++++V 
Query 121 P EE AT KA VT EMN GRI VAT KP L YV 143 



Pedant information for DKFZphtes3_8ml0, frame 3 



Report for DKFZphtes3_8mlO . 3 

[LENGTH] 235 

[MW] 26308.08 

[pi] 8.95 

[HOMOL] SWISSPR0T:PAB1_HUMAN POL YADENY LATE- BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 
1) (PABP 1) . le-113 

[FUNCAT] 04.05.05 mrna processing (5' -end, 3' -end processing and mrna degradation) [S. 
cerevisiae, YERl65w] le-64 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-64 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 
YER1 65w] le-64 

[FUNCAT] 30.10 nuclear organization - [S. cerevisiae, YER165w] le-64 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YFR023w] le-24 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YFR023w] le-24 

[FUNCAT] 04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w] 

2e-19 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YOR319w] 2e-14 

[FUNCAT] 04.01.04 rrna processing [S. cerevisiae, YGR159c] le-11 

[FUNCAT] 11.01 stress response [S. cerevisiae, YGR159c) le-11 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YGR250c] le-09 

[FUNCAT] 04.07 rna transport [S. cerevisiae, YOLl23w HRP1 - CF lb] le-09 

[FUNCAT] 30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 8e-09 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YPRll2c] 2e-08 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YHR086w] 2e-08 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YBR212w] 3e-08 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YBR212w] 3e-08 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 
3e-04 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YDR432w] 3e-04 

[BLOCKS] BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

[BLOCKS] BL00900D Bacteriophage-type RNA polymerase family proteins signatur 

[SCOP] dlsxl 4.34.7.1.3 Sex-lethal protein [(Drosophila melanogaster ) 9e-23 

[SCOP] d2ula 4.34.7.1.2 U1A protein (human (Homo sapiens) 6e-24 

[SCOP] dlupl_2 4.34.7.1.1 Nuclear ribonucleoprotein Al, RNP Al, UP le-13 

[PIRKW] nucleus le-110 

(PIRKW] duplication le-110 

[PIRKW] RNA binding le-110 

[PIRKW] nucleolus 4e-10 

[PIRKW] tandem repeat 4e-10 

[PIRKW] single-stranded DNA binding le-06 

[PIRKW] DNA binding 9e-12 

[PIRKW] phosphoprotein 4e-10 

[PIRKW] mitochondrion 6e-07 

[PIRKW] heterotrimer 4e-06 

[PIRKW] alternative splicing le-15 

[PIRKW] chloroplast 5e-ll 

[PIRKW] transcription regulation 3e-09 

[PIRKW] GTP binding 2e-06 

[SUPFAM] helix-destabilizing protein le-07 

[SUPFAM] nucleolin 4e-10 

[SUPFAM] glycine-rich RNA-binding protein 2e-07 

[SUPFAM] yeast HRP1 protein 2e-08 
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[SUPFAM] unassigned ribonucleoprotein repeat-containing proteins 3e-25 

[SUPFAM] polyadenylate-binding protein le-112 

[SUPFAM] ribonucleoprotein repeat homology le-112 

[PROSITE] RNP 1 1 

[PFAM] RNA~recognition motif, (aka RRM, R&D, or RNP domain) 

(KWJ All_Beta 
[KW] 3D 

SEQ ERSRLVCLRAAVPRMNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDL 
lhal- EEEETTTTTTCHHHHHHHHGGGCCEEEEEEEETT 



SEQ ITSGSSNYAYVNFQHTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 

lhal- TTTCEEEEEEEEECCHHHHHHHHHHTTEEE-TT EEEEEEECTTTTCCCCCEEEEECC 

SEQ DKSINNKALYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGR 

lhal- TTTTCHHHHHHHHGGGCCEEEEEEEETTTTTCEEEEEEECCHHHHHHHH 

SEQ KVFVGQFKSRKEREAELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSVN 

lhal- 



Prosite for DKFZphtes3_8mlO . 3 
PS00030 152->160 RNP 1 PDOC00030 



Pfam for DKFZphtes3_8mlO . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



RNA recognition motif, (aka RRM, RBD, or RNP domain) 

* I YVGNLPWDtTEEDLr Dl FsQFGpI vs I rMMr DReTGRSRGFAFVEFED 
+YVG+L +D+TE +L + FS+ GPI+SIR+ RD T S +A+V+F+ 
27 LYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 



E E DAe k A I de MNGme FmGRr I RV * 
DAE A+D+MN ++ G+++R+ 
76 TKDAEHALDTMNFDVI KGKPVRI 



98 



E E DAe JcAI deMNGme FmGRr I RV * 
+E+AE+AI +MNGM+++GR++ V 
162 HEAAERAIKECMNGMLLNGRKVFV 



75 



*IYVGNLPWDtTEEDLrDlFsQFGpIvSlrMMrDReTGRSRGFAFVEFED 
I+V+NL+ +++ L D S FG I+S++++ D + S+G++FV FE+ 
115 IFVKNLDKSINNKALYDTVSAFGNILSCNVVCD— ENGSKGYGFVHFET 161 



184 
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DKFZphtes3_8p7 
group: testes derived 

DKFZphtes3_8p7 encodes a novel 412 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

2 EST hits (both from testis librarys) 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2899 bp 

Poly A stretch at pos. 2870, polyadenylation signal at pos. 2852 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



CCGACCCGCC 
ACACTGCCCA 
CGGCGGCGCG 
ATGTTCATTT 
TAATATTTAT 
AATGGAATTG 
TTTTTCTGAC 
GATTGACCAG 
TTACTTTCAT 
CCCAGAATTT 
GTAAGAAATC 
ATGAACTGGC 
GACCATTGAA 
AATTACCTCT 
CCCCAGTCGT 
AGCCATTGCC 
ATGATCTATA 
AGTGACTTGT 
AGACACCTTG 
TGGAAGACAG 
GAGGGCGTGC 
TAAAGATAGA 
TAGAACATAT 
GACAAGGGAT 
TAAAGTCCTA 
CACCTGGAAC 
GTTTGGTGGC 
CCTAGCAACG 
GCACGGAGGA 
CCTCAGGTCG 
CGTGTAAGTC 
TTGAAAATTC 
TTTCATTTGT 
GTACAGAGGG 
CAGGCAGCCT 
TGAACAAGCA 
TCATGTGTGT 
TAAACACTCT 
GATTTTAAAT 
GGGATTTGAA 
AAAGGAGAAA 
TATCTTGGAG 
GTATAGCCAC 
ACGGTATATT 
TTTTAAGATG 
ATTATTACTG 
CAGACATTTG 
TGTGATCCTC 
ACAAATTAGG 
ATTACACATT 
ACTGCCCAGC 
TTCCATTTAC 
TTCTCAGATC 
ATAAAGTTGT 
CAGACACAGG 



CTGGGGTGCT 
TGGCGCAAGG 
TCCTTGTCCG 
GTCAACGACA 
TAATATTGAA 
TGGGCGTCAT 
CGGAAGCTAA 
AAGGACCAAA 
TCAGTTACTG 
GAACTGGCCC 
ACAGCCTGGA 
GCCAGCTGTG 
AGAAGTAACC 
AGAAGATGGG 
TGCCGAAAGA 
GGGCTGGTAG 
TCCTTTGCTT 
ACATTGGCTG 
CAAGTGACTG 
AAGAAATTTT 
TGGCTTCTGG 
AGTTACATGA 
GACATTTTCT 
CTGTTTATAT 
GATGCTTGTG 
CCAATACTTC 
TGGAGGATTG 
GTTCTGGCTT 
TGGCTCGGTC 
TGCACAAGGC 
CTTTCTGCCT 
TAGTGAAGCC 
TTTGAATGTT 
CTGTGTAGTA 
GGTACCAATT 
AAACATAATA 
ATAGTGTGCC 
TGTTGTCTCT 
AACTTGCTCA 
CCCAGGTAGT 
ACTATGTACA 
AATGTTTTAT 
ATAGTATTCC 
AATGAACATT 
ATAATATAGA 
TTTCTTTTTA 
TATTCTGCTT 
AAAACACCAG 
GAATTCAGGG 
ACACAGCTAG 
TCCACCACCC 
CCCCTGCCTT 
CTCTTATTCT 
AGACATGTTT 
GTAATGAAAT 



GCGTGCGCTG 
CCGGGAGCGC 
TGAGATGGGT 
ACACCATTTG 
ACCAAGAAAA 
GGCAACTAAC 
AACCTCTCAT 
TTGAAAGGCA 
TGGCACCTAC 
TTTGGAACTG 
ATGGATGTGA 
CTTATCAAGT 
AGGAGCATTG 
TCATTTTTTA 
TCTCATCTAT 
GCAAAGAGGC 
CACCCGACTA 
TGAAGAGGGT 
TACTTAATAA 
ATCAGTCCAG 
AATTGATGGC 
TCGAGGATTT 
CCCAATTATA 
CTACACTTTT 
ATGGGAAATT 
ATGACACTTA 
TGCTTGTGTA 
GCTGTCCATC 
TACTTCATCA 
CTTTCTCTCG 
CCAGGAGCGG 
ATCCTTTCTT 
AATATATTCA 
AAGTACCCCC 
TCTCATGTCT 
AGCACTTCTT 
AGGCACTTCT 
ATTTGACAGG 
ATAGTACACA 
TGGGCTGCAG 
ATGCCTCATT 
ATCCACACAT 
ATTATATGAA 
TAGAGTATTT 
G AC ATT AG AT 
ATTTATTATA 
TACAGCTTGA 
TCAGAAAGGT 
TTAGAGAGGT 
CACACTGAGG 
TAGCTCAGTG 
TCTGCACTGT 
AATTTACATC 
CACTACATTC 
GTCACACCCA 



CCTGCTCCCG 
GACGAAGGCC 
GCAAGGATTC 
CTACCCTTGT 
AGACTGTACT 
ATCCCCTGTG 
CTACGTATAC 
ACATTCTCCT 
CTGGCTAGTT 
GGAATCGAGT 
ACCAAATGTC 
CCAAGTACAG 
TTTCAGAGCA 
ATGAAACGGA 
GGTCCCGTGC 
AGAGACTTTC 
TGCATTGCTG 
CATCTTTTAA 
GATAGAAGAG 
TAACCTTGGT 
TTTGTGTATT 
TCTTGAGATT 
CAGTGTTGCT 
GGTAAGGAGC 
TCAGGCAATT 
CATATTCAGG 
AGCAAGATTT 
CTCCCTCTCT 
GCGTATATGA 
GAATCGTCCG 
CTCCGTGTCA 
TTAATTTTAA 
CACAGTTCAA 
CATACCCAGG 
CTCCTGAGAT 
TTTACTTGTA 
GCTGTATTAA 
TGAGGAAGAT 
GATAGTGAAT 
AGTCACTGCC 
TCTTTTTTCA 
AAAGACCAGC 
TATACTATCA 
CAAAACTTTT 
TTGGACTTGT 
TTATTAGGTA 
GATCACTGTA 
GTTATTCTTA 
GAGGAAAAGC 
AGCTGGCCCT 
GGGAAGGATG 
CATTTTTTTG 
TTCCCACTTT 
TTCCTCCCAC 
CCACTAATTT 



CCTGAGGAAA 
CCCACTCCGC 
CCTAAGCAGA 
GGGAATTATG 
GCAGTGTAGT 
AAGTTGTGGC 
AGCTTTCCAG 
GGACTACACT 
ACTCCTCTCT 
ATCATTTTGT 
TTTTAACCCC 
TGAGCGTGTG 
AGGTCGGTGA 
TGTCGTTTTC 
TGCCACTGTC 
CGGCCGAAAG 
GACTCCAACA 
TGATTAATGG 
GAATCGCCAT 
ATATCAGAAG 
CTTTTATTAT 
GAAAGACCTG 
GATTCAAACA 
CAACCTTAAA 
GACTTTATCA 
GGAAATTTGT 
ATCTGAATAC 
GCAGCCGTGG 
TAAGGAATCC 
TGCAGCACGT 
CACCCGTCTG 
GTTTTACGTG 
CACTCAAAAG 
TCTGTCCTTG 
GTTTTATCCA 
TCAATGGCCA 
CTCCATGAGG 
AAGGCACAAG 
GGCAAATGTT 
TTTGCTCTTA 
CTTAATCGTA 
CTGATTATTT 
TTTTTTAAAA 
GAAGCAATAC 
AGGTGCTATC 
TTAATAAGAA 
GCTTGTGGCA 
TCCCTATTAG 
ATTGTCCAAG 
GCCACTGTGG 
GATAACCTCC 
TGCCTTTCCT 
TTCTAATTTG 
TGCCAGGTAC 
GAGAATTGCT 
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2751 TATTTGCGCT TGAAACATCA AGAAAGCTCT ACCGACAGAC ATGTTTCATT 
2801 CACTTATGAT GAACCAACTG CCCATCTTTA CTGAATCTTC TTGACTGTAT 
2851 TTATTAAAGT TGCAATTTGG AAATAAAAAA AAAAAAAAAA AAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 269 bp to 1504 bp; peptide length: 412 
Category: putative protein 
Classification: no clue 



1 MATNIPCEVV 
51 CGTYLASYSS 
101 CLSSPSTVSV 
151 DLIYGPVLPL 
201 CEEGHLLMIN 
251 GIDGFVYSFI 
301 IYTFGKEPTL 
351 CACVSKIYLN 
401 AFLSESSVQH 



AFSDRKLKPL 
LPEFELALWN 
WTIERSNQEH 
SAIAGLVGKE 
GDTLQVTVLN 
IKDRSYMIED 
NKVLDACDGK 
TLATVLACCP 
W 



IYVYSFPGLT 
WESSIILCKK 
CFRARSVKLP 
AETFRPKDDL 
KIEEESPLED 
FLEIERPVEH 
FQAIDFITPG 
SSLSAAVGTE 



RRTKLKGNIL 
SQPGMDVNQM 
LEDGSFFNET 
YPLLHPTMHC 
RRNFISPVTL 
MTFSPNYTVL 
TQYFMTLTYS 
DGSVYFISVY 



LDYTLLSFSY 
SFNPMNWRQL 
DWFPQSLPK 
WTPTSDLYIG 
VYQKEGVLAS 
LIQTDKGSVY 
GEICVWWLED 
DKESPQVVHK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKF2phtes3_8p7, frame 2 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_8p7, frame 2 



Report for DKF2phtes3_8p7. 2 



[LENGTH] 412 

[MW] 46476.62 

[pi] 4.91 

[KW] Alpha_Beta 



SEQ MATNIPCEVVAFSDRKLKPLIYVYSFPGLTRRTKLKGNILLDYTLLSFSYCGTYLASYSS 

PRD cccccceeeeeecccccceeeeeecccccccccccchhhhhhhheeeecccccccccccc 

SEQ LPEFELALWNWESSIILCKKSQPGMDVNQMSFNPMNWRQLCLSSPSTVSVWTIERSNQEH 

PRD cchhhhhhhhccccceeeccccccccceeeccccccceeeeeccccceeeeeeeecchhh 

SEQ CFRARSVKLPLEDGSFFNETDVVFPQSLPKDLIYGPVLPLSAIAGLVGKEAETFRPKDDL 

PRD hhhhhhhcccccccccccccccccccccccccccccccceeeeeeccccccccccccccc 

SEQ YPLLHPTMHCWTPTSDLYIGCEEGHLLMINGDTLQVTVLNKIEEESPLEDRRNFISPVTL 

PRD cccccccccccccccceeeecccceeeecccceeeeeehhhhhcccccccccccccccee 

SEQ VYQKEGVLASGIDGFVYSFIIKDRSYMIEDFLEIERPVEHMTFSPNYTVLLIQTDKGSVY 

PRD eeeceeeeecccceeeeeeeeeccchhhhhhhhhhcccceeeccccceeeeeecccccee 

SEQ IYTFGKEPTLNKVLDACDGKFQAIDFITPGTQYFMTLTYSGEICVWWLEDCACVSKI YLN 

PRD eeeccccccchhhhhcccccceeeeeccccceeeeeeeccceeeeeeecceeeeeeeehh 

SEQ TLATVLACCPSSLSAAVGTEDGSVYFISVYDKESPQVVHKAFLSESSVQHVV 

PRD hhhhhhhccccccceeeeccccceeeeeeeccccccchhhhhhhcccccccc 



(No Prosite data available for DKFZphtes3_8p7 . 2 ) 
(No Pfam data available for DKFZphtes3_Bp7 .2) 
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DKFZphtes3_9e22 



group: testes derived 

DKFZphtes3_9e22 encodes a novel 227 amino acid protein with weak partial similarity to Ring- 
finger proteins. 

For the novel protein, Pfam, but not Prosite predicts a C3HC4 type RING finger motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to zinc finger proteins 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1318 bp 

Poly A stretch at pos. 1308, no polyadenylation signal found 



1 GCTCCCCCGG CTTTCGGAGC CCGGGGGCGG CCTGTGGCGC GCGGAGCCCG 

51 CGCCGGACTG CGCCTCTTTG GACCTTGAGG GGAAACATGC GTTTGCCTTG 

101 GATCGTTTGA AATTCTAAGT TTGGGATCCC CGCCCGCCCG CCTGCCTCTT 

151 CCGCCCCGCG GGTTTTTTCC TTTTTTCCTT TTGCTTTTTT TCCTTTTCTC 

201 CCTCCGGGTC TCCTTTTTGA CTCCCTCCCC CTTTATGCTC GCCCAGCCCT 

251 CCCCCTGCTG CTGAGAAGTG GGGGAGGGTC TCGGCCTCCA GGTTCCCGCC 
30 1 . CCACCGGGGC CCGGGCGAGC ATGGGGGGCA AGCAGAGCAC GGCGGCCCGC 

351 TCCCGGGGCC CCTTCCCGGG GGTCTCCACC GATGACAGCG CCGTGCCGCC 

401 GCCGGGAGGG GCGCCCCATT TCGGGCACTA CCGGACGGGC GGCGGGGCCA 

451 TGGGGCTGCG CAGCCGCTCG GTCAGCTCGG TGGCAGGCAT GGGCATGGAC 

501 CCCAGCACGG CCGGGGGGGT GCCCTTTGGC CTCTACACCC CCGCCTCCCG 

551 GGGCACCGGC GACTCCGAGA GGGCGCCCGG CGGCGGAGGG TCTGCGTCCG 

601 ACTCCACCTA TGCCCATGGC AATGGTTACC AGGAGACGGG CGGCGGTCAC 

651 CATAGAGACG GGATGCTGTA CCTGGGCTCC CGAGCCTCGC TGGCGGATGC 

701 TCTACCTCTG CACATCGCAC CCAGGTGGTT CAGCTCGCAT AGTGGTTTCA 

751 AGTGCCCCAT TTGCTCCAAG TCTGTGGCTT CTGACGAGAT GGAAATGCAC 

801 TTTATAATGT GTTTGAGCAA ACCTCGCCTC TCCTACAACG ATGATGTGCT 

851 GACTAAAGAC GCGGGTGAGT GTGTGATCTG CCTGGAGGAG CTGCTGCAGG 

901 GGGACACGAT AGCCAGGCTG CCCTGCCTGT GCATCTATCA CAAAAGCTGC 

951 ATAGACTCGT GGTTTGAAGT GAACAGATCT TGTCCGGAAC ACCCTGCGGA 

1001 CTGACCTGCG GGCTTGCTTG CTGACTCCTC TCAAAGGGAC AGAGCGCCCC 

1051 TGCTCCAGGG AGGAGGCTCA CCGGACCCTG GGGCAGAGCT GAGCTTGGGA 

1101 CACCAGCGGG AACAGGGCAC CCCTTCTGCA CTGACTTCCA GATCATGGTT 

1151 CTCCCTTCCT CCCTGAGGAC ACCAAATTGG ATGAGAGCAA GTTTGAGAGA 

1201 AGAATGAATC AACTGCTATC CTTCCCCTCA CCCCTCAGCC CAGGAGGGAA 

1251 AGGGCATTTT CTTTTTCATC TTTGAAAGGC ATTGTGGGTC TGTCTTTAAA 

1301 GTGTTTACAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 321 bp to 1001 bp; peptide length: 227 
Category: similarity to known protein 
Classification: unclassified 



1 MGGKQSTAAR SRGPFPGVST DDSAVPPPGG APHFGHYRTG GGAMGLRSRS 

51 VSSVAGMGMD PSTAGGVPFG LYTPASRGTG DSERAPGGGG SASDSTYAHG 

101 NGYQETGGGH HRDGMLYLGS RASLADALPL HIAPRWFSSH SGFKCPICSK 

151 SVASDEMEMH FIMCLSKPRL SYNDDVLTKD AGECVICLEE LLQGDTIARL 
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201 PCLCIYHKSC IDSWFEVNRS CPEHPAD 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_9e22, frame 3 

TREMBL:AF078823_1 product: "RING-H2 finger protein RHA2b"; Arabidopsis 
thaliana RING-H2 finger protein RHA2b mRNA, complete cds . , N =* 1 , Score 
- Ill, P = 2.8e-06 

TREMBL:AF078822_1 product: "RING-H2 finger protein RHA2a M ; Arabidopsis 
thaliana RING-H2 finger protein RHA2a mRNA, complete cds., N » 1, Score 
= 112, P « 6.6e-06 

TREMBL:AC004138_14 gene: "T17M13 . 17"; Arabidopsis thaliana chromosome 
II BAC T17M13 genomic sequence, complete sequence., N = 2, Score = 123, 
P » 1.4e-05 

PIR.-T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana, N « 1, 
Score - 142, P « 8.8e-08 



>PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 
Length - 327 

HSPs: 

Score = 142 (21.3 bits), Expect - 8.8e-08, P = 8.8e-08 
Identities = 24/57 (42%), Positives = 30/57 (52%) 

Query: 166 SKPRLSYNDDVLTKDAGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCP 222 

S P + LT D +C +C+EE + G LPC IYHK CI W +N SCP 

Sbjct: 206 SLPSVKITPQHLTNDMSQCTVCMEEFIVGGDATELPCKHIYHKDCIVPWLRLNNSCP 262 



Pedant information for DKFZphtes3_9e22, frame 3 



Report for DKFZphtes3_9e22 .3 



[ LENGTH ] 227 

[MW] 23782.62 

[pi] 6.18 

[HOMOL] PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 2e-08 

[FUNCAT) 99 unclassified proteins (S. cerevisiae, YDR313cJ 4e-06 

[ FUN CAT ) 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOL013c) 

0.001 

[FUNCAT] 06.13 proteolysis [S. cerevisiae, YOL013c] 0.001 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[KW) Irregular 



SEQ MGGKQSTAARSRGPFPGVSTDDSAVPPPGGAPHFGHYRTGGGAMGLRSRSVSSVAGMGMD 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccceeeccccccc 

SEQ PSTAGGVPFGLYTPASRGTGDSERAPGGGGSASDSTYAHGNGYQETGGGHHRDGMLYLGS 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeech 

SEQ RASLADALPLHIAPRWFSSHSGFKCPICSKSVASDEMEMHFIMCLSKPRLSYNDDVLTKD 

PRD hhhhhhhhceeecccccccccccccccccccchhhhhhhhhhhhcccccccccccccccc 

SEQ AGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEHPAD 

PRD cceeeeeecccccccccccccceeeeeeccchhhhhhhhcccccccc 



(No Prosite data available for DKFZphtes3_9e22 . 3) 



Pfam for DKFZphtes3_9e22 . 3 



HMM__NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C IC L+++ D++ LPC+ ++ ++CI +W CP+ 

Query 184 CVIC LEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEH 224 
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DKFZphtes3_9i20 



group: testes derived 

DKFZphtes3_9i20 encodes a novel 205 amino acid protein with similarity to human KIAA0336 gene. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 

Locus: /map- "4 4.1 cR from top of Chrl7 linkage group" 
Insert length: 2509 bp 

Poly A stretch at pos. 2499, polyadenylation signal at pos. 2481 



1 CTCGCCGAGA TGACCTGGGC ACCTCTGCGT TGAATCGGCA AATACTGATC 
51 AAGCCGCATT TATTCTGCTC TCAGGAACTC TAAGTCTAGC AGAGAAGATG 
101 AGGCGGTAGA AGTTCATCAA TGGCTTGGCT GGAGGACAAG CAAATTGAGG 
151 ACATTGGCAA CGGAGTGATC AAAATGATAG ATCATGAGGC CTAAAATGAA 
201 TAAGGAAAGA AGAGAAGTGG CAGAGGCTGA GAACAGAAAG AGAGGGTGGA 
251 GGGGCTGTAA ATCTTGAAGA TTAGGGTATA ATATGAGTAT ATGGGTAAGA 
301 ATTGGAAGAA TTGTGTAGGA GGCAGTAGTC AAAAAGTAGA AGCAGTTTGG 
351 AAGAGTAGTT ACAAATATCA AGAGCCAGGT GGCTAAAAGG TGGAGCTATA 
401 GGTCATTGAA GCTCAAGAAA CTGAGTCTCT AGGGCATTGG TTAAGTCATC 
4 51 TGTCTAGACT TCAAAGTTGT CTAGGATGAT AATTCAGAAG ACTGATCTGT 
501 GCCAAAGTCA CAGGTTTTTC ACGACTGAAA ACAACATAGC AAAATAAGCC 
551 AAGATGTCTG TGGATCCAAT GACCTACGAG GCCCAGTTCT TTGGCTTCAC 
601 GCCACAAACG TGCATGCTTC GGATCTACAT TGCATTTCAA GACTACCTAT 
651 TTGAAGTGAT GCAGGCCGTT GAACAGGTTA TTCTGAAGAA GCTGGATGGC 
701 ATCCCAGACT GTGACATTAG CCCAGTGCAG ATTCGCAAAT GCACAGAGAA 
751 GTTTCTTTGC TTCATGAAAG GACATTTTGA TAACCTTTTT AGCAAAATGG 
801 AGCAACTGTT TTTGCAGCTG ATTTTACGTA TTCCCTCAAA CATCTTGCTT 
851 CCTGAAGATA AATGTAAGGA GACACCTTAT AGTGAGGAAG ATTTTCAGCA 
901 TCTCCAGAAA GAAATTGAAC AGTTACAGGA GAAGTACAAG ACTGAATTAT 
951 GTACTAAGCA GGCCCTTCTT GCAGAATTAG AAGAGCAAAA AATTGTTCAG 
1001 GCCAAACTCA AACAGACGTT GACTTTCTTT GATGAGCTTC ATAATGTTGG 
1051 CAGAGATCAT GGGACTAGTG ATTTTAGGGA GAGTTTAGTA TCCCTGGTTC 
1101 AGAACTCCAG AAAACTACAG AACATTAGAG ACAATGTGGA AAAGGAATCG 
1151 AAACGACTGA AAATATCTTA ATTGCTCAGT AGTCAAAAGG AGGAGCCTGT 
1201 CAAAAAGTAG AATCATAAGG ACTGTTCAAA CCATAAGGAC TGTTCAAATC 
1251 ATACCAGTGA CTGTTCAAAC CAACCATACT TTTTATTAGA TTTGCTTTGT 
1301 CAACTCTTTC TTGTATTCTG TGTTTTCCTC TTTTTTGGTC CACTTTGCTG 
1351 AGGTATGAAG TGTACTACTT TGAACTAGGC TGAAGCATCT GAGTCTTCTA 
1401 ATAAGTGGGA AGGGATCCAA CAAAGAAGCC ATGACCAGTT AAAGATATTT 
1451 GCAGAGTTAC ACCTTGGTCA TAAGTCCTTT GTGACCTTGA TTATTTTGGC 
1501 TTACTCTTTG GATGAGACCA GACAAGAAAA GGATTAAACG GGTGGCTCCT 
1551 TTAATATTAT TATTATTGTT TTTGAGACAA GGTCCCTTTC TGTCACCCAG 
1601 GTTAGAGTAG ATTTCAGTGG CACAATCTTG GCTCACTGCA ACCTCTGTGT 
1651 CCTGGGCTCA AGTGATCCTC CTGCCTCAGC CTCCCAAGTA GCTAGGACCA 
1701 CAGGTGCGTG TCACCATGCT TGGCTAATTT TTTTGCAGAA ACGAGGCCTC 
1751 ACTATATTGT CCAGGCTGAG TGGCTCTTTT ATTAACCAGT CATTACACTG 
1801 CGGAACAGCC AACATAGAGT ACTTGCTCTC GTCCTGTGAA TTTTCTTTCA 
1851 TGAGGGAGTC AATATGTAGT GGAAAGAAGC ATGTAGCAAA AAAGACAACC 
1901 TTGATCTTTA ATAAAAAAGA AGTTGGTTTA TTTCCAAAAT AAATCCCCTG 
1951 ACAAAAAACC TGGTGATGTT AAGCAATTGA CTGTCTTAGA GTCCAGCAGA 
2001 AGACCTTAGA CAAAAAAAGC AGAACCCACT GGAGTAGAAA AGGAAGCATG 
2051 TAGCATATAC TCAGTAGTGA AATTTAATTT TACTGACTGT TAGGTATCTA 
2101 TGCCAATTTG TTTTCATACT TCAGTTGGTT TTGGAATCTG CCTTATACCT 
2151 AATATTTATT TATTCACACT CATAAGCATC AAATATTTAA TGCCCTCAGT 
2201 GGGAAATTTG TGTTTAAACT CAATGGAATC TAATATTTCT TTATGTCGTT 
2251 AGTCCCTGTA AAATGTTAGG TCACCCAAGG AAAGGGGAGA AATAGCAATG 
2301 GTTGTTCCTA AGGTATTGCT TGCCCTCCAT GTCTTCCTAA AGAGCAGAAC 
2351 TTGGAGTTTC TCCTTTATGT AGAGAAGAAG TAACTTAGGG TGTATTTGCA 
2401 ATGAAATATT CATAGATATT GAAAGCTTGT GTTTACATGA AATATGTTTA 
2451 TTATCAAGAA GTCCTTTTTC CAATTCTGTA CATTAAATAT ATGTGTTTTA 
2501 AAAAAAAAA 



BLAST Results 
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Entry AC004148 from database EMBL: 

Homo sapiens chromosome 17, clone HCIT524C5, complete sequence. 
Score = 5245, P = 0.0e+00, identities = 1049/1049 
3 exons 

Entry HS556361 from database EMBL: 
human STS TIGR-A003N29. 

Score = 1005, P «» 1.3e-39, identities = 201/201 

Entry HSG043 from database EMBL: 
human STS SHGC-36031. 
Score « 955, P = 2.8e-37, identities « 205/215 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 554 bp to 1168 bp; peptide length: 205 
Category: putative protein 
Classification: no clue 



1 MSVDPMTYEA QFFGFTPQTC MLRIYIAFQD YLFEVMQAVE QVILKKLDGI 
51 PDCDISPVQI RKCTEKFLCF MKGHFDNLFS KMEQLFLQLI LRIPSNILLP 
101 EDKCKETPYS EEDFQHLQKE IEQLQEKYKT ELCTKQALLA ELEEQKIVQA 
151 KLKQTLTFFD ELHNVGRDHG TSDFRESLVS LVQNSRKLQN IRDNVEKESK 
201 RLKIS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9i20, frame 2 

TREMBLNEW: HSAB2334_1 gene: n KIAA0336 n ; Human mRNA for KIAA0336 gene, 

complete cds., N = 1, Score = 107, P = 0.0081 



>TREMBLNEW:HSAB2334_1 gene: "KIAA0336 M ; Human mRNA for KIAA0336 gene, 
complete cds . 

Length = 1, 583 

HSPs : 

Score = 107 (16.1 bits), Expect = 8.2e-03, P = 8.1e-03 
Identities - 42/140 (30%), Positives = 76/140 (54%) 

Query: 65 EKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEED FQHLQKE 120 

EK CF+K H +NL +EQ +L R ILL +D ++P + D + L+++ 

Sbjct: 796 EKEKCFIKEH-ENLKPLLEQK— ELRDRRAELILL-KDSLAKSPSVKNDPLSSVKELEEK 851 

Query: 121 IEQLQE—KYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESL 178 

IE L++ K K E K L+A ++ +K + + K+T T +EL ++ + S+ 
Sbjct: 852 IENLEKECKEKEEKINKIKLVA-VKAKKELDSSRKETQTVKEELESLRSEK— DQLSASM 908 

Query: 179 VSLVQNSRKLQNIRDNVEKESKRLKI 204 

L+Q + +N+ EK+S++L + 
Sbjct: 909 RDLIQGAESYKNLLLEYEKQSEQLDV 934 



Pedant information for DKFZphtes3_9i20, frame 2 



Report for DKFZphtes3_9i20.2 



[LENGTH] 205 

[MWJ 24140.13 

[pi) 5.51 

[KW] All_Alpha 

[KW] COILED_COIL 18.05 % 
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SEQ MSVDPMTYEAQFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVILKKLDGIPDCDISPVQI 

PRD cccccchhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

COILS 

SEQ RKCTEKFXiCFMKGHFDNLFSKMEQLFLQLILRI PSNILLPEDKCKETPYSEEDFQHLQKE 

PRD cccchhhhhhhcccccchhhhhhhhhhhhhhhcccceeeccccccccccchhhhhhhhhh 

COILS CCCCCCCCCC 

SEQ IEQLQEKYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESLVS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LVQNSRKLQNIRDNVEKESKRLKIS 

PRD hhcccchhhhhhhhhhhhhhhcccc 

COILS 



(No Prosite data available for DKFZphtes3_9i20.2) 
(No Pfam data available for DKFZphtes3_9i20 . 2 > 
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group: testes derived 

DKFZphtes3_9k22 encodes a novel 304 amino acid protein with partial similarity to X. leavis 
katanin p80. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C-terminus of katanin p80 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2676 bp 

Poly A stretch at pos. 2665, no polyadenylation signal found 



1 CTCTCTAGGC TGCCGGGCGC TGGTCGTCAG CGCCGAGGCT GGGCTGAGGC 
51 GCCGCGGTAC CATGAGGCGC CGGTACTTAA GAGATTATGG CATCAGAAAC 
101 CCACAATGTT AAAAAACGGA ACTTTTGTAA TAAGATTGAG GATCATTTCA 
151 TTGATCTTCC TAGAAAAAAG ATCTCTAATT TCACTAATAA GAACATGAAG 
201 GAGGTTAAGA AATCTCCAAA ACAGTTGGCT GCTTACATAA ATAGAACAGT 
251 TGGACAAACT GTGAAAAGCC CAGATAAACT TCGTAAAGTG ATCTATCGCA 
301 GAAAGAAAGT TCATCATCCC TTTCCAAATC CTTGTTACAG AAAAAAACAG 
351 TCCCCTGGAA GTGGGGGCTG TGACATGGCA AATAAAGAAA ATGAACTGGC 
401 TTGTGCAGGC CACCTGCCTG AAAAATTACA CCATGATAGT CGAACATATT 
451 TGGTTAACTC CAGTGATTCT GGTTCTTCAC AGACAGAAAG CCCATCATCA 
501 AAATATAGTG GGTTTTTTTC TGAGGTTTCT CAGGACCATG AAACAATGGC 
551 CCAAGTTTTG TTCAGCAGGA ATATGAGATT GAATGTAGCT TTAACTTTCT 
601 GGAGAAAGAG AAGTATAAGT GAACTTGTAG CTTATTTGTT GAGGATAGAA 
651 GATCTTGGCG TTGTGGTAGA TTGCCTTCCT GTGCTCACCA ATTGTTTACA 
701 GGAAGAAAAA CAATATATCT CACTTGGCTG CTGTGTTGAC TTGTTGCCTC 
751 TAGTAAAGTC ACTACTTAAA AGCAAATTTG AAGAATATGT TATAGTTGGT 
801 TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGGTCAG AACTATCATC 
851 CAAAACAGAA ATTATAAATG ATGGAAATAT TCAAATTTTA AAACAACAAT 
901 TAAGTGGATT ATGGGAACAG G AAA AC CATC TTACTTTGGT TCCAGGATAT 
951 ACTGGTAATA TAGCTAAGGA TGTAGATGCT TATTTATTAC AGTTACATTG 
1001 AGAGATTTCA TCTACTAAAG AGCATTTGGT TTTTCAAAAC ATCCCTGAAC 
1051 TGTATAATTT ACAAAAAAAA AAGTCTCGTC TGAGAACTGT GAACTGTGGA 
1101 AGAAATCAAA ACTATTTTTT CTTTTAAAAA GCCACGTAAT GAAACCACTA 
1151 ATGAAATCCC AGCAATCTGC TTCACATTGA AGTGGAAAAA TATCCAAAAG 
1201 GAGCAGCTTC AATTTCATTG AGGTGAAAGT GCACTATGAA GATTGTTCAC 
1251 CTTTGCTGCA TTTGGGAGTT ATATGGTTAT TTGGTAACAT TAAGAACTAC 
1301 TGGATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATGTGAAAA 
1351 AATAAGACAG GACTTACCAC TAGGAACCAC CAAGACCAAT CATCATTAAC 
1401 TTTTTTAAGA TTGTGTTTTA TTAAAAAAAA AAAACACTTA AATGTGTGCA 
1451 GCTATTTTCT TATGTTGAAA AGACTGAAAG TTTAAAACAT GAAAAAAATC 
1501 AATATTAAAC ATTTTTTGTT CACACTGAGA TACTGTGTAT GTAAAATGCC 
1551 TTAATTATTA ATAAGCCAAT GTGTTATGAT ACCAATATCT GTTTTAAAAA 
1601 ACTAAAACCA ACCATGCTTC TGGCATGATA AAATCATGGA ATTAAATCAG 
1651 GGGTTTACAT TCTTGTAGAG TGTTCTTGAA ACACTCTCTG CACCATTTTT 
1701 AAAACTTGAG AATAGTTTTA GTATCTCTGA TATTTTTTGC CAGAATCATC 
1751 ATGTCATGTA TGAATGTGTT ATCCCTATCT AAGGAAAAAG GTGAATATGT 
1801 TTTTGTATGA ATGTTTAACT GGAAATGTCC ATGGACTTGG CTAATTTATA 
1851 TTTACTTTTT ATTGTACATA GATTTCTAAT ATTTTTCATT CCTGTATCAT 
1901 TTAAACTTCC TTCATTTGAG TAAATTCACT AAATATTTCT ATTTTTTTGC 
1951 TTTTTTAAAT TCTGATTTTA TATGAATTCT AATTCTTTTT C ACT AC AT AT 
2001 GTTTTAAAGA GTTACATACA GTGATTTAGA ATGGTTTACA GTTAATGCTG 
2051 ATCTTGTATT TTAAATTCCA ACACTTTGTG TCACTACCTC CTCTAATGGT 
2101 TAGTATGATA TGCTAGCAGA CTGTATGAGG TCTTTTTTTA AAATACCACT 
2151 TTTAGTGTCA GTGAACCAAA TTCTGGAATG TCTTAACAGC TCTAAATCTT 
2201 ACTTGTCTTG AAAATGATTG GGGTTTAATA CCACTGCTGG TGGTTCACAC 
2251 ATCATCCCAT CCTTAATATG CCTGACAGGC ATCTGAGCAA AGGTTTTTAG 
2301 TAATTGAATT TCTCTGCAGT AGTCCTTCAA GCACTTGAAT GTAAACCTTT 
2351 AGCATTTATT CGTTTAATGA CTACTGATAC GAATCTCAAG CAGATTTCTT 
2401 GCTCTTAAAA GTTATGTTTC ACTGAGTTCT GGTTTTGTGT AGCTATATTT 
2451 TATATAGCTA GATATTCCTC ACAGTGAACA TGAATTGTAA TAATTGGTTA 
2501 TTTCCTTAAG TCTTTAGATT ATAATAATTT CAGATTATTG CACGTCTGTG 
2551 ATTTGAGAGG TGAGTTATTT AAGAGGCCAG TTTTCAGGAC ATGGGAATTT 
2601 GAATTGTAAA CCTGTTATCT CTGTGAAACT TTTAACATGA TAAAATATAA 
2651 CCTTTCTTTG TGCTTAAAAA AAAAAA 
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BLAST Results 



Entry HS541354 from database EMBL: 
human STS WI-11640. 
Score - 1267, p - 7.1e-50, identities - 271/281 



Medline entries 



98227670: 

Katanin, a raicrotubule-severing protein, is a novel AAA ATPase 
that targets to the centrosome using a WD40-containing subunit. 



Peptide information for frame 3 



ORF from 87 bp to 998 bp; peptide length: 304 
Category: similarity to known protein 
Classification: unclassified 



1 MASETHNVKK RNFCNKIEDH FIDLPRKKIS NFTNKNMKEV KKSPKQLAAY 
51 INRTVGQTVK SPDKLRKVIY RRKKVHHPFP NPCYRKKQSP GSGGCDMANK 
101 ENELACAGHL PEKLHHDSRT YLVNSSDSGS SQTESPSSKY SGFFSEVSQD 
151 HETMAQVLFS RNMRLNVALT FWRKRSISEL VAYLLRIEDL GVVVDCLPVL 
201 TNCLQEEKQY ISLGCCVDLL PLVKSLLKSK FEEYVIVGLN WLQAVIKRWW 
251 SELSSKTEII NDGNIQILKQ QLSGLWEQEN HLTLVPGYTG NIAKDVDAYL 
301 LQLH 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9k22, frame 3 

TREMBL : AF05602 1_1 product: "p80 katanin"; Xenopus laevis p80 katanin 
mRNA, partial cds., N = 1 , Score » 146, P = 1.2e-07 

TREMBL: AF052 4 32_1 product: "katanin p80 subunit"; Homo sapiens katanin 
p80 subunit mRNA, complete cds., N = 1, Score - 150, P - 1.2e-07 

TREMBL:AF052433_1 product: "katanin p80 subunit"; Strongylocentrotus 
purpuratus katanin p80 subunit mRNA, complete cds., N - 2, Score = 146, 
P - 4.2e-07 



>TREMBL: AF052432_1 product: "katanin p80 subunit"; Homo sapiens katanin p80 
subunit mRNA, complete cds. 
Length *» 655 

HSPs: 

Score = 150 (22.5 bits), Expect * 1.2e-07, P - 1.2e-07 
Identities - 35/105 (33%), Positives » 55/105 (52%) 

Query: 145 SEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISELVAYLLRIEDLGWVDCLPVLTNCL 204 

S++ + H+TM VL SR+ L+ W I V + I DL WVD L N + 
Sbjct: 489 SQIRKGHDTMCVVLTSRHKNLDTVRAVWTMGDIKTSVDSAVAINDLSWVDLL NIV 544 

Query: 205 QEEKQY ISLGCC VDLLPLVKSLLKSKFEE YVI VGLNWLQAV I KRW 24 9 

++ L C +LP ++ LL+SK+E YV G L+ +++R+ 

Sbjct: 545 NQKASLWKLDLCTTVLPQIEKLLQSKYESYVQTGCTSLKLILQRF 589 



Pedant information for DKFZphtes3_9k22, frame 3 



Report for DKFZphtes3_9k22 . 3 



(LENGTH) 304 

IMW] 34767.24 

Ipl] 9.18 

[KW] All_Alpha 
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[KW] LOW_COMPLEXITY 3 . 95 % 

SEQ MASETHNVKKRNFCNKIEDHFIDLPRKKISNFTNKNMKEVKKSPKQLAAYINRTVGQTVK 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccccc 

SEQ SPDKLRKVIYRRKKVHHPFPNPCYRKKQSPGSGGCDMANKENELACAGHLPEKLHHDSRT 

SEG 

PRD ccchhhhhhhhhhhcccccccccccccccccccccccccchhhhhhccccccccccccce 

SEQ YLVNSSDSGSSQTESPSSKYSGFFSEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISEL 

SEG 

PRD eeecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VAYLLRIEDLGVVVDCLPVLTNCLQEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhcceeeeeeccchhhhhhhhceeeccceeeehhhhhhhhhhhheeeeeeehh 

SEQ WLQAVIKRWWSELSSKTEIINDGNIQILKQQLSGLWEQENHLTLVPGYTGNIAKDVDAYL 

SEG 

PRD hhhhhhhhhhhhcccceeeeccccccccccccchhhhhhhhhhccccccccchhhhhhhh 

SEQ LQLH 

SEG 

PRD hccc 

(No Prosite data available for DKFZphtes3_9k22 . 3) 
(No Pfam data available for DKFZphtes3_9k22 .3) 
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Prosite Key 

NAME: N-glycosylation site. 
CONSENSUS: N-{P}-[ST]-{P}. 

NAME: Glycosaminoglycan attachment site. 
CONSENSUS: S-G-x-G. 

NAME: Tyrosine sulfation site. 

NAME: cAMP- and cGMP-dependent protein kinase phosphorylation site. 
CONSENSUS: [RK](2)-x-[ST]. 

NAME: Protein kinase C phosphorylation site. 
CONSENSUS: [STJ-x-[RK]. 

NAME: Casein kinase II phosphorylation site. 
CONSENSUS: [ST]-x(2)-[DE]. 

NAME: Tyrosine kinase phosphorylation site. 
CONSENSUS: [RK]-x(2.3MDE]-x(2.3)-Y. 

NAME: N-myristoylation site. 

CONSENSUS: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P}. 

NAME: Amidation site. 
CONSENSUS: x-G-[RK]-[RK]. 

NAME: Aspartic acid and asparagine hydroxy lation site. 
CONSENSUS: C-x-[DN]-x(4)-[FY]-x-C-x-C. 

NAME: Vitamin K -dependent carboxylation domain. 

CONSENSUS: x(l2)-E-x(3)-E-x<:-x(6>-[DEN]-x-[LIVMFYl-x(9)-[FYW]. 
NAME: Phosphopantetheine attachment site. 

CONSENSUS: [DEQGSTAI^KRH]-[LIVMFYSTAC]-[GNQ]-[Lr^MFYAG]-[DNEKHS]-S-|lJVMST]- 
CONSENSUS: {PCFY}-[STAGCPQLIVMF]-[UVMATN]-[DENQGTAKRH1^]-[LIVMWSTA]-[LIVG^^ 
CONSENSUS: x(2)-[LIVMFA] . 

NAME: Acyl carrier protein phosphopantetheine domain profile. 

NAME: Prokaryotic membrane lipoprotein lipid attachment site. 

CONSENSUS: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C. 

NAME: Prokaryotic N-terminal me thy lation site. 

CONSENSUS: [KRHEQSTAG]-G-tFYLIVM]-[ST].[LTl*ILrVP]-E-[LIVMFWSTAGJ(I4). 

NAME: Prenyl group binding site (CAAX box). 
CONSENSUS: C-{DENQ)-[LIVM]-x > , 

NAME: Protein splicing signature. 

CONSENSUS: [DNEG]-x-[UVFA)-tLIVMY]-[LVASTl-H-N-[STC]. 

NAME: Endoplasmic reticulum targeting sequence. 
CONSENSUS: [KRHQSA)-[DENQ)-E-L > . 

NAME: Microbodies C-terminal targeting signal. 
CONSENSUS: [STAGCNHRKHj-[LIVMAFY] > . 

NAME: Gram-positive cocci surface proteins 'anchoring* hexapeptide. 
CONSENSUS: L-P-x-T-G-[STGAVDE] . 

NAME: Bipartite nuclear targeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D. 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: [AG}-x(4)-G-K-[ST3. 

NAME: Cyclic nucleo tide-binding domain signature 1. 

CONSENSUS: [LIYM]-tVIC]-x(2)-G-[DENQTA3-x-[GAC]-x(2)-[LIVMFY](4).x<2)-G. 
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NAME: Cyclic nucleotide-binding domain signature 2. 

CONSENSUS: [UVMF]-G-E-x-[GAS]-[LiVM]-x(5, 1 l)-R-[STAQ]-A-x-[LIVMA]-x-{STACV]. 
NAME: cAMP/cGMP binding motif. 
NAME: EF-hand calcium-binding domain. 

CONSENSUS: D-a-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC1-x(2)- 
CONSENSUS: fDEMLI VMFYW] . 

NAME: Actinin-rype actin-binding domain signature 1 . 
CONSENSUS: [EQ]-x(2MATV]-[FY]-x(2)-W-x-N. 

NAME: Actinin-rype a c tin-binding domain signature 2. 

CONSENSUS: tUVM]-x-[SGN]-[LIVM]-[DAGHE)-[SAGl-x-[DNEAG]-[LIVM]-x.[DEAG]-x(4)- 
CONSENSUS: [LIVM]-x-[LM]-[SAG]-[LIVM]-[LIVMTl-W-x-[LIVM](2). 

NAME: Anaphylatoxin domain signature. 

CONSENSUS: [CSH]-C-x(2)-[GAP]-x(7 > 8HGASTDEQR]-C-[GASTDEQL]-x(3,9)-[GASTDEQN]-x(2>- 
CONSENSUS: [CE]-x(6,7)-C-C. 

NAME: Anaphylatoxin domain profile. 

NAME: Apple domain. 

CONSENSUS: C-x(3)-[LIVMrT].x(5>[LIVMFY]-x(3HDENQ]-[LIVMFY].x(10)-C-x(3)-C-T- 
CONSENSUS: x(4)-C-x-[LrVMFYl-F-x-[FY]-x(l3 f 14)-C-x-[LIVMFY]-[RK]-x-[STl-x(14J5>- 
CONSENSUS: S-G-x-[S*T|-[LIVMFY]-x<2>-C. 

NAME: Band 4.1 family domain signature 1. 

CONSENSUS: W-[UV]-x(3)-[KRQ]-x-[LIVMl-x(2)-[QH]-x(0,2)-[UVMFl.x(6,8)-[LIVMF]- 
CONSENSUS: x(3,5)-F-[FY]-x(2)-[DENS]. 

NAME: Band 4.1 family domain signature 2. 

CONSENSUS: [HYW]-x(9)-[DENQSTV]-[SA]-x(3)-[FY]-tLIVM]-x(2)-[ACV3-x(2Hl-Ml-x(2>- 
CONSENSUS: [FY]-G-x-[DENQST]-[UVMFYS]. 

NAME: Band 4.1 family domain profile. 

NAME: Clq domain signature. 

CONSENSUS: F-x(5>-[ND]-x(4)-[FYWLl-x(6)-F-x(5)-G-x-Y-x-F-x-[FY]. 

NAME: C -terminal cystine knot signature. 

CONSENSUS: C-C-x(13)-C-x(2)-[GN]-x(12)-C-x-C-x(2 ( 4)-C. 

NAME: C-terminal cystine knot profile. 

NAME: CUB domain profile. 

NAME: Death domain profile. 

NAME: EGF-like domain signature 1 . 
CONSENSUS: C-x-C-x(5)-G-x(2)-C. 

NAME: EGF-like domain signature 2. 
CONSENSUS: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C. 

NAME: Calcium-binding EGF-like domain pattern signature. 

CONSENSUS: [DEQN]-x-[DEQN](2)-C-x(3.14)-C-x(3,7>-C-x-[DN]-x(4)-[FY]-x-C. 

NAME: Laminin-type EGF-like (LE) domain signature. 

CONSENSUS: C-x(1.2)-C-x(5>G-x(2)^-x(2K-x{3,4)-[FYW]-x(3 t 15)-C. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 1 . 
CONSENSUS: [GAS]-W-x(7J5)-[rW]-[LIV]-x-{UVFAM^ 
CONSENSUS: [UVT]-[QKM]-G. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 2. 
CONSENSUS: P-x(8, lO)-[LM]-R-x-[GE]-[LI VPJ-x-G-C. 

NAME: Forkhead-associated (FHA) domain profile. 

NAME: Fibrinogen beta and gamma chains C- terminal domain signature. 
CONSENSUS: W-W-[UVMFYW]-xaK>x(2)-[GSAl-x(2)-N-G. 

NAME: Type I fibronectin domain. 
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CONSENSUS: C-x(6.8HLFY]-x(5)-lFYWl-x-[RK}-x(8 t 10)-C-x-C-x(6,9)-C. 
NAME: Type II fibronectin collagen-binding domain. 

CONSENSUS: C-x(2)-P-F-x-[FYWI]-x(7).C-x(8.10)-W-C-x(4)-[DNSR3-[FyW]-x(3.5>[FYWl-x- 
CONSENSUS: [FYWI]-C. 

NAME: Hemopexin domain signature. 

CONSENSUS: [UFAT]-x(3)-W-x(2J)-[PE]-x(2)-[LIVMFY]-[DENQS3-[STA]-[AV)-[LIVMFYl. 

NAME: Kringle domain signature. 
CONSENSUS: [FY]-C-R-N-P-[DNR]. 

NAME: Kringle domain profile. 

NAME: LDL-receptor class A (LDLRA) domain signature. 

CONSENSUS: C-[VIUVIAJ-x(5)-C-[DNH].x(3)-[DENQHr|-C-x(3,4).[STADE]-[DEH]-[DE]-x(l,5)- 
CONSENSUS: C. 

NAME: LDL-receptor class A (LDLRA) domain profile. 
NAME: C-type lectin domain signature. 

CONSENSUS: C-[UVMFYATC]-x(5J2>[WL]-x-[DNSR]-x^ 
CONSENSUS: C. 

NAME: C-type lectin domain profile. 

NAME: Link domain signature. 

CONSENSUS: C-x(15)-A-x(3,4)-G-x(3)-C-x(2H}-x(8 t 9)-P-x(7)-C. 
NAME: Osteonectin domain signature 1 . 

CONSENSUS: C-x-[DN]-x(2)-C-x(2)-G-[KRH]-x-C-x(67)-P-x-C-x-C-x(3,5)-C-P. 

NAME: Osteonectin domain signature 2. 
CONSENSUS: F-P-x-R-[IM]-x-D-W-L-x-[NQ]. 

NAME: Somatomedin B domain signature. 

CONSENSUS: C-x-C-x(3)-C-x(5)-C-C-x-[DNltFY]-x(3)-C. 

NAME: Tbyroglobulin type-1 repeat signature. 

CONSENSUS: [FYWHP]-x-P-x<;-x(3,4)-G-x-[FyW]-x(3)-Q-C-x(4,10)-C-[FYW]-C-V-x(3,4)- 
CONSENSUS: [SGI. 

NAME: P-type Trefoil* domain signature. 

CONSENSUS: R-x(2)^-x-[FYPSTl-x(3,4)-[STl-x(3)-C-x(4)-C-C-[FYWH]. 
N AME : Cellulose-b ind ing domai n , bacte rial type . 

CONSENSUS: W-N-[STAGR]-[STDNl-[LI\^]-x(2)-[GSTl-x-[GST]-x(2)-[LIVMFT]-[GA]. 
NAME: Cellulose-binding domain, fungal type. 

CONSENSUS: C-G-G-x(4,7)-G-x(3)-C-x(5)-C-x(3,5)-[NHG]-x-[FYWM]-x(2)-Q-C. 

NAME: Chitin recognition or binding domain signature. 
CONSENSUS: C-x(4.5K-C-S-x(2)-G-x-C-G-x(4)-[FYWK. 

NAME: Barwin domain signature 1 . 
CONSENSUS: C-G-[KRJ-C-L-x-V-x-N. 

NAME: Barwin domain signature 2. 
CONSENSUS: V-[DN]-Y-[EQ]-F-V-tDN]-C. 

NAME: BIR repeat. 

CONSENSUS: [HKEPILVy]-x(2)-R-x(3.7)-[FYW]-x(ll.l4HSTAN]-G-[LMFl-X-IFYHDA]-X(4)- 
CONSENSUS: [DESL]-X(2,3)-C-X(2)-C-X(6)-[WA]-X(9)-H-X(4)-[PRSD]-X-C-X(2)-[LIVMA}. 

NAME: WAP-type 'four-disulfide core' domain signature. 
CONSENSUS: C-x-{C}-[DN]-x(2)-C-x(5K-C. 

NAME: Phorbol esters / diacylglycerol binding domain. 

CONSENSUS: H-x-[LIVMFYW]-x(8. 1 l)-C-x(2)-C-x(3)-[LIVMFC]-x(5, 10K-x(2K-x(4)-[HD]- 
CONSENSUS: x(2)-C-x(5,9)-C. 

NAME: C2 domain signature. 

CONSENSUS: [ACG]-x(2>-I^x(2,3VD.x(l,2)-[NGSTLIF]-[GTMR]-x-[STAPl-I>-[PA]-[FY]. 
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NAME: C2 -domain profile. 

NAME: CAP-Gly domain signature. 

CONSENSUS: G-x(8,10)-[FYW]-x-G-[LIVM)-x-[LIVMF 

CONSENSUS: x(2)-[LYJ-F. 

NAME: Ly-6 / u-PAR domain signature. 

CONSENSUS: [EQR]-C-(LIVMFV r AH]-x-C-x(5 ? 8K:-x(3.8)-rEDNQSTVl-C-{C}-x(5)-C- 
CONSENSUS: x(12,24)-C. 

NAME: MAM domain signature. 

CONSENSUS: G-x-[UVMFY](2)-x(3)-[STA]-x(10Jl)-[LV]<4htl^MF]-x(6.7)-C.[LIVM]-x- 
CONSENSUS: F-x-[LIVMFY]-x(3)-[GSC]. 

NAME: MAM domain profile. 

NAME: PH domain profile. 

NAME: Phosphotyrosine interaction domain (PID) profile. 
NAME: Src homology 2 (SH2) domain profile. 
NAME: Src homology 3 (SH3) domain profile. 
NAME: VWFC domain signature. 

CONSENSUS: C-x(2,3)-C-x-C-x(6, 14)-C-x(3,4)-C-x(2, 10)-C-x(9, 16)-C-C-x(2,4)-C. 
NAME: WW/rsp5/WWP domain signature. 

CONSENSUS: W-x(9. ll)-[VFY]-[FYW]-x(6.7)-[GSTNE]-[GSTQCR3-[FYW]-x(2)-P. 
NAME: WW/rsp5/WWP domain profile. 
NAME: ZP domain signature. 

CONSENSUS: [LIVMFYW]-x(7MSTAPDNL]-x(3)-[LIV^ 

CONSENSUS: [LIVMFYW3-x-[ST)-[PSL]-x(2 t 4)-[DENS]-x-[STADNQLF]-x(6)-[LIVM](2)-x(3.4)- 
CONSENSUS: C. 

NAME: S layer homology domain signature. 

CONSENSUS: [LVFYT]-x-[DA]-x(2,5)-[DNGSATPHY]-(WYFPDA]-x(4)-[LIV]-x(2)-[GTALV]- 
CONSENSUS: x(4,6)-tLIVFYC]-x(2)-G-x-[PGSTA]-x(2,3)-[MFYA]-x-[PGAV]-x(3J0)<UVMA]- 
CONSENSUS: [STKR]-[RY]-x-[EQ]-x-[STALIVM]. 

NAME: 'Homeobox' domain signature. 

CONSENSUS: [LIYMF^G]-[ASLVR]-x(2)-ILIVMSTACN]-x-[UVM]-x(4)-[LIV]-[RKNQESTAIY]- 
CONSENSUS: [LIVFSTNKHl-W-tFYVC]-x-[NDQTAH3-x(5>-[RKNAIMW]. 

NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox' antennapedia-rype protein signature. 
CONSENSUS: [LrVMFE]-(FY)-P-W-M-fKRQTA]. 

NAME: 'Homeobox' engrailed-type protein signature. 
CONSENSUS: L-M-A-Q-G-L-Y-N. 

NAME: 'Paired box 1 domain signature. 
CONSENSUS: R-P-C-x(l D-C-V-S. 

NAME: 'POU' domain signature 1. 

CONSENSUS: [RKQ3-R-[LIM3-x-[LF]-G-tUVMFY]-x-Q-x-tDNQ]-V-G. 
NAME: 'POU' domain signature 2. 

CONSENSUS: S-Q-[ST}-rTA]-I-[SC]-R-F-E-x-[LSQ]-x-[LI]-[STI. 
NAME: Zinc finger, C2H2 type, domain. 

CONSENSUS: C-x(2,4K-x(3)-[LIVMFYWC]-x(8)-H-x(3.5)-H. 

NAME: Zinc finger, C3HC4 type (RING finger), signature. 
CONSENSUS: C-x-H-x-[UVMFY3-C-x(2)-C-[UVMYA] . 

NAME: Nuclear hormones receptors DNA-binding region signature. 
CONSENSUS: C-x(2)-C-x-PE]-x(5)-[HN]-[FY]-x(4)-C-x(2)-C-x(2)-F-F-x-R. 

NAME: GATA-type zinc finger domain. 

CONSENSUS: C-x-tDN]-C-x(4,5)-[STI-x(2)-W.[HR]-[RK]-x(3)-[GN]-x(3.4)-C-N-[AS]-C. 
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NAME: Poly(ADP-ribose) polymerase zinc finger domain signature. 
CONSENSUS: C-[KR]-x-C-x(3)-I-x-K-x(3)-[RG]-x(16,18)-W-{FYH]-H-x(2)-C. 

NAME: Poly(ADP : ribose) polymerase zinc finger domain profile. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain signature. 

CONSENSUS: [GASTPV]^-xa)<:-tRKHSTACW]-x(2).[RKHQ]-x(2)-C-x(5,l2)-C-x(2K-x(6,8)- 
CONSENSUS: C. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR C4-type zinc finger. 
CONSENSUS: C-[DES]-x-C-x(3)-I-x(3>-R-x(4)-P-x(4)-C-x(2)-C. 

NAME: Copper-fist domain signature. 

CONSENSUS: M-[UVMF](3)-x(3)-K-[MY]-A-C-x(2)-C-I-[KR]-x-H-[KR]-x(3)-C-x-H-x(8)- 
CONSENSUS: [KR]-x-[KR]-G-R-P. 

NAME: Copper fist DNA binding domain profile. 

NAME: Leucine zipper pattern. 
CONSENSUS: L-x(6)-L-x(6)-L-x(6)-L. 

NAME: bZIP transcription factors basic domain signature. 

CONSENSUS: [KRJ-x(l,3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ}-x-R-x-lRK]. 

NAME: Myb DNA -binding domain repeat signature 1 . 
CONSENSUS: W-[STl-x(2>-E-[DE]-x(2MUV]. 

NAME: Myb DNA-bi tiding domain repeat signature 2. 

CONSENSUS: W-x(2)-[LI]-[SAG]-x(4,5>R-x(8)-[YW3-x(3)-[LlVM]. 

NAME: Myc-type, 'helix-loop-helix' dimerization domain signature. 

CONSENSUS: [DENSTAP]-K-[LIVMWAGSN]-{FYWCPHKR}-[LIVT].[LIV]-x(2)-[STAV]-[UVMSTAC]-x- 
CONSENSUS: [VMFYH]-[LIVMTA]-{P}-{P}-[LIVMSR]. 

NAME: p53 tumor antigen signature. 
CONSENSUS: M-C-N-S-S-C-M-G-G-M-N-R-R. 

NAME: CBF-A/NI*-YB subunit signature. 

CONSENSUS: C-V-S-E-x-I-S-F-[LIVM]-T-[SG]-E-A-lSC]-[DE]-(KRQ]-C. 
NAME: CBF-B/NF-YA subunit signature. 

CONSENSUS: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E. 
NAME: 'Cold-shock' DNA-binding domain signature. 

CONSENSUS: [FY]-G-F-I-x(6,7)-lDER]-[LIVM]-F-x-H-x-[STKRJ-x-[LiVMFY3. 

NAME: CTF/NF-l signature. 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R. 

NAME: Ets-domain signature 1 . 

CONSENSUS: L-[FYWMQEDH]-F-[LI]-[LVQK]-x-[LI]-L. 
NAME: Ets-domain signature 2. 

CONSENSUS: [RKH]-x(2)-M-x-Y-[DENQ]-x-[LIVM}-tSTAG]-R-[STAG]-[LI]-R-x-Y. 

NAME: Ets-domain profile. 

NAME: Fork head domain signature 1 . 

CONSENSUS: [KR]-P-[PTQ]-tFYLVQH)-S-[FY]-x(2)-[LIVM]-x(3,4)-[AC]-[LIM]. 

NAME: Fork head domain signature 2. 
CONSENSUS: W-[QKR]-[NS]-S-[LIV]-R-H. 

NAME: Fork head domain profile. 

NAME: HSF-type DNA-binding domain signature. 

CONSENSUS: L-x(3)-tr^]-K-H-x-N-x-[STAN]-S-F-[LIVM]-R-Q-L-[NH]-x-Y-x-[FYW]-tRKH]-K- 
CONSENSUS: [LIVM]. 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: W-x-[DNH]-x(5)-[LIVFl.x-[IV].p.W-x-H-x(9 ( 10)-[DE]-x(2)-[LIVF]-F-[KRQ]-x- 
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CONSENSUS: [WR]-A. 
NAME: LIM domain signature. 

CONSENSUS: C-x(2)-C-x(15,2t)-[FYWH]-H-x(2)-[CHl-x(2)-C-x(2)-C-x(3>-[LIVMF]. 

NAME: LIM domain profile. 

NAME: NF-kappa-B/Rel/dorsal domain signature. 
CONSENSUS: F-R-Y-x-C-E-G. 

NAME: MADS-box domain signature. 

CONSENSUS: R-x-[RKJ-x(5)-I-x-PN]-x(3)-[KR3-x(2)-T-[FY]-x-[RK](3)-x(2)-[UVM]-x- 
CONSENSUS: K(2)-A-x-E-[LIVM]-tST]-x-L-x(4)-[LIVM]-x-[LIVM](3)-x(6)-[LIVMF]-x(2)- 
CONSENSUS: [FY]. 

NAME: MADS-box domain profile. 

NAME: T-box domain signature 1. 

CONSENSUS: L-W-x(2)-[Fq-x(3 t 4)-[NT].E-M-tLIV](2)-T-x(2)-G-[RG]-[KRQ]. 
NAME: T-box domain signature 2. 

CONSENSUS: [LIVMYW]-H-[PADHl-[DENJ-[GS]-x(3)-G-x(2)-W-M.x(3)-[IVA]-x-F. 
NAME: TEA domain signature. 

CONSENSUS: G-R-N-E-L-^x(2^Y-^x(3)-^^C]-x(3)-R-T-[RK](2)-Q-tLIVM]-S-S-H-[LIVM]- 
CONSENSUS: Q-V. 

NAME: Transcription factor TFIIB repeat signature. 

CONSENSUS: G-[KR]-x(3>-[STAGN)-x-[LIVMYA3-lGSTA](2)-[CSAV]-[LIVMl-[LIVMFY]-[UVMA]- 
CONSENSUS: [GSA]-[STAC]. 

NAME: Transcription factor TF1ID repeat signature. 

CONSENSUS: Y-x-P-x(2)-[IFJ-x(2)-[LIVM](2).x-[KRH]-x(3)-P-[RKQ]-x(3)-L-[LIVM]-F-x- 
CONSENSUS: [STN]-G-[KR]-[LIVM]'X(3)-G-[TAGL]-[KR]-x(7).[AGCJ-x(7)-[UVM]. 

NAME: TFIIS zinc ribbon domain signature. 

CONSENSUS: C-x(2)-C-x(9)-[LlVMQSAR]-[QH]-[STQL]-[RA]-[SACR3-x-[DE]-[DET]-[PGSEA]- 
CONSENSUS: x(6)-C-x(2,5)-C-x(3)-[FW] ( 

NAME: TSC-22 / dip / bun family signature. 
CONSENSUS: M-D-L-V-K-x-H-L-x(2)-A-V-R-E-E-V-E. 

NAME: Prokaryotic transcription elongation factors signature 1. 

CONSENSUS: [ST]-x(2HGS]-x(3)-[LI]-x(2)-E-L-x(2)-L-x(3,4)-R-x(2).[IV]-x(3)-[LIVl- 
CONSENSUS: x(6>G-D-x(2)-E-N-[GSA]-x-Y. 

NAME: Prokaryotic transcription elongation factors signature 2. 

CONSENSUS: S-x(2)-S-P-[UVM]-[AG3-x-[SAG]-[LIVMl-[LIVMY)-x(4)-[DG]-[DE}. 

NAME: DEAD-box subfamily ATP -dependent he li cases signature. 
CONSENSUS: [LIVMF](2)-D-E-A-D-[RKEN]-x-lLIVMFYGSTN]. 

NAME: DEAH-box subfamily ATP-dependent helicases signature. 
CONSENSUS: [GSAH]-x-[UVMfl(3>-D-E-[AUV]-H-[NECR]. 

NAME: Eukaryotic putative RNA-binding region RNP-1 signature. 
CONSENSUS: [RKl-G-{EDRKHPCG}-[AGSCI]-[FY]-(LIVA]-x-[FYLM] . 

NAME: Fibrillarin signature. 

CONSENSUS: [GST|-[LIVMAP]-V-Y-A-[IV]-E-[FY)-[SA]-x-R-x(2)-R-[DE]. 
NAME: MCM family signature. 

CONSENSUS: G-[IVT]-[LVAC](2)-[IV , n-D*[DE]-[FL]-rpNSTl. 
NAME: MCM family domain. 
NAME: XPA protein signature 1. 

CONSENSUS: C-x-[DE]-C-x(3)-[LIVMF]-x(1.2)-D-x(2)-L-x(3)-F-x(4)-C-x(2)-C. 
NAME: XPA protein signature 2. 

CONSENSUS: [LIVM](2)-T-[KR]-T-E-x-K-x-[DE]-Y-[LIVMF](2)-x-D-x-[DE]. 
NAME: XPG protein signature 1. 

CONSENSUS: [Vrj-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]*x.[LVC]-K. 
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NAME: XPG protein signature 2. 

CONSENSUS: [GS]-[UVM]-[PER]-[FYS]-[UVM]-x-A-P-x-E-A-[DE]-[PAS]-[QS]-[CLM]. 
NAME: Bacterial regulatory proteins, araC family signature. 

CONSENSUS: [KRQ]-[LIVMA]-x(2)-[GSTALIV}.{FYWPGDN}-x(2)-[LIVMSA]-x(4,9)-ELrVMF3- 
CONSENSUS: x(2)-[LIVMSTA]-[GSTACIL]-x(3)-(GANQRF]-[LIVMFY]-x(4 1 5)-[LFYl-x(3)- 
CONSENSUS: [FYIVA]-{FYWHCM}-x(3)-[GSADENQKR]-x-[NSTAPKL]-[PARL]. 

NAME: Bacterial regulatory proteins. araC family DNA-binding domain profile. 

NAME: Bacterial regulatory proteins. arsR family signature. 
CONSENSUS: C-x(2)-D-[LIVM]-x(6)-[ST]-x(4)-S-[HYR]-[HQ]. 

NAME: Bacterial regulatory proteins. asnC family signature. 

CONSENSUS: [GSTAP]-x(2)-[DNEA3[LIVM]-[GSA]-x(2>-[LIVMFY]-[GN]-[LIVMST]-[ST]-x(6)-R- 
CONSENSUS: [LVT]-x(2MLIVM]-x(3)-G. 

NAME: Bacterial regulatory proteins, crp family signature. 

CONSENSUS: [LIVM]-tSTAG]-[RHNW]-x(2)-[LIM]-tGA]-x-[LIVMFYA]-[LIVSC]-[GA3-x-[STACN]- 
CONSENSUS: x(2)-[MST]-x-[GSTN]-R-x-[LIVMF)-x(2)-[LIVMF] . 

NAME: Bacterial regulatory proteins. deoR family signature. 

CONSENSUS: R-x(3)-[LIVM]-x(3)-[LIVM]-x(16,17)-[STA]-x(2)-T-[LIVMA]-[RH]-[KRNA]-D- 
CONSENSUS: [LIVMF]. 

NAME: Bacteria) regulatory proteins. gntR family signature. 
CONSENSUS: [LIVAPKR]-[PILV3-x-[EQriWMR] 
CONSENSUS: PNGSTK]-[RGTLV]-x-[STAIVP]-[LI^ 

NAME: Bacterial regulatory proteins, iclR family signature. 

CONSENSUS: [GA]-x(3)-[DS]-x(2)-E-x(6)-[CSA]-[LIVM]-[GSA]-x{2)-[LIVM]-[Fni3-tDN]. 
NAME: Bacterial regulatory proteins, lad family signature. 

CONSENSUS: [LIVM]-x-[DE]-[LIVM]-A-x(2)-[STAGV]-x-V-[GSTP]-x(2)-[STAG]-[LIVMA}-x(2)- 
CONSENSUS: [LIVMFYAN]-[LIVMC]. 

NAME: Bacterial regulatory proteins, luxR family signature. 
CONSENSUS: [GIX:]-x(2)-[NSTAVY]-x(2)-flVHGSTA]-x(2>^^ 
CONSENSUS: [NSTl-[LIVM]-x(5)-tNRHSA]-[LIVMSTA]-x(2)-[KR]. 

NAME: Bacterial regulatory proteins. lysR family signature. 

CONSENSUS: [NQKRHSTAG]-[LIVMFYTA]-x(2)-[STAGLV]-[STAG].x(4)-[LIVMYCTQR]-lPSTANLVER]- 
CONSENSUS: x-[PSTAGQV]-[PSTAGNVMF]-[LIVMFA]-[STAGH]-x(2HLIVMFl-x(2)-[LIVMFW]- 
CONSENSUS: [RKEAVJ-x(2)-[LIVMFYNTAE]-x(3)-[LlMVTJ. 

NAME: Bacterial regulatory proteins. marR family signature. 

CONSENSUS: [STNA]-[LIA]-x-[RNGS]-x(4)-[LM]-[EIV]-x(2>-[GES]-[LFYW]-[LIVC3-x(7)- 
CONSENSUS: [DN]-[RKQG]-[RK]-x(6>T-x(2)-[GA] , 

NAME: Bacterial regulatory proteins. merR family signature. 

CONSENSUS: [GSA]-x-[UVMFA]-[ASM]-x(2>-tSTACLrV]-[GSDENQR]-[UVC]-[STANHKl-x(3)- 
CONSENSUS: [LIVM)-[RHF3-x-[YW]-[DEQ]-x(2,3)-[GHDNQ)-[LIVMF](2). 

NAME: Bacterial regulatory proteins, tetR family signature. 
CONSENSUS: G-[LIVMFYS]-x(2.3)-[TS]-[lJVMT>x^^^ 

CONSENSUS: [GPAR]-x-[UVMF]-[FYSTl-x-[HFY]-[FVl-x-[DNST]-K-x(2)-[LIVM]. 

NAME: Transcriptional anti terminators bglG family signature. 
CONSENSUS: [ST]-x-H-x{2)-[FA](2)-fLIVM]-[EQKJ-R-x(2)-[QNK]. 

NAME: Sigma-54 factors family signature 1 . 

CONSENSUS: P-[LIVM]-x-[IJVM]-x(2)-[LIVM]-A-x(2)-[U^ 

NAME: Sigma-54 factors family signature 2. 
CONSENSUS: R-R-T-[IV]-[AT]-K-Y-R. 

NAME: Sigma-54 factors family profile. 

NAME: Sigma-70 factors family signature 1. 

CONSENSUS: PE]-[LIVMF)(2V[HEQS]-x-G-x-[UVMFA]^.L-[LIVMFyE3-x-[GSAM]-[LIVMAPJ. 
NAME: Sigma-70 factors family signature 2. 

CONSENSUS: [STN]-x(2)-[DE<a-[UVM]-[GAS]-x(4)-[LrVMFn-[PSTG]-x(3)-[LIVMA3-x-[NQR]- 
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CONSENSUS: [LIVMA]-[EQH]-x(3)-[LIVMFW]-x(2)-tLIVM]. 
NAME: Sigma-70 factors ECF subfamily signature. 

CONSENSUS: (miVl-[PQDEL]-PE]-[LIV]-[UVTA]-Q-x-[STAV]-[LIVMFYC]-[LrVMAK]-x- 
CONSENSUS: [GSTAIV]-[LIMFYWQ]-x(l2,14)-[STAP]-[FYW]-[LIF)-x(2)-[IV]. 

NAME: Sigma-54 interaction domain ATP-binding region A signature. 
CONSENSUS: [LrVMFY](3)-x-G.lDEQ]-[STE]-G-[STAV]-G-K-x(2)-[LIVMFYJ. 

NAME: Sigma-54 interaction domain ATP-binding region B signature. 

CONSENSUS: [GS]-x-[LIVMF]-x(2)-A-[DNEQASH]-[GNEK]-G-[STIM]-[LIVMFY](3)-[DEJ-[EK]- 
CONSENSUS: [LIVM]. 

NAME: Sigma-54 interaction domain C -terminal part signature. 
CONSENSUS: [FYW]-P-[GSJ-N-[LIVM]-R-[EQ]-L-x-tNHATl. 

NAME: Sigma-54 interaction domain profile. 

NAME: Single -strand binding protein family signature 1 . 

CONSENSUS: [LIVM F]-tNST]-[KRT]-[LIVM]-x-[LIVMF](2)-G-[NHRK)- [LIVM] -[GST] -x-[DET]. 

NAME: Single-strand binding protein family signature 2. 

CONSENSUS: T-x-W-[HYl-[RNS]-[UVM]-x-[LIVMF]-[FY]-[NGKR]. 

NAME: Bacterial histone-like DNA-binding proteins signature. 

CONSENSUS: [GSK]-F-x(2)-[UVMF]-x(4)-[RKEQA]-x(2)-[RST3-x-[GA]-x-[KN]-P-x-T. 
NAME: Dps protein family signature 1 . 

CONSENSUS: H-[FW]-x-[LrVMJ-x-G-x(5)-[LV]-H-x(3)-[DE]. 
NAME: Dps protein family signature 2. 

CONSENSUS: [UVMFY]-[DH]-x-[LIVM]-[GA]-E-R-x(3>-[LIF]-[GDN]-x(2)-[PA]. 

NAME: DNA repair protein radC family signature. 
CONSENSUS: H-N-H-P-S-G. 

NAME: recA signature. 

CONSENSUS: A-L-[KR]-PF]tFY]-[STA]-[STAD]-tLTVMQ]-R. 
NAME: RecF protein signature 1 . 

CONSENSUS: P-[ED]-x(3)-[LIVM](2)-x-G-[GSAD]-P-x(2>-R-R-x-[Fy3-[LIVM]-D. 
NAME: RecF protein signature 2. 

CONSENSUS: [LIVMFY](2)-x-D-x(2.3)-[SA]-[EH]-L-D-x(2)-[KRH]-x(3)-L. 
NAME: RecR protein signature. 

CONSENSUS: C-x(2)-C-x(3HST]-x(4)-C-x-I-C-x(4)-R. 

NAME: Histone H2A signature. 
CONSENSUS: [AC]-G-L-x-F-P-V. 

NAME: Histone H2B signature. 

CONSENSUS: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-[LrVM](2)-x-[PAG]-[DE]-L-x-[kR]-H-A- 
CONSENSUS: [UVM]-[STA]-E-G. 

NAME: Histone H3 signature 1. 
CONSENSUS: K-A-P-R-K-Q-L. 

NAME: Histone H3 signature 2. 

CONSENSUS: P-F-x-[RA]-I^[VAMKRQ]-[DEG]-[IV]. 

NAME: Histone H4 signature. 
CONSENSUS: G-A-K-R-H. 

NAME: HMGt/2 signature. 

CONSENSUS: [FI)-S-[KR]-K-C-S-(EK]-R-W-K-T-M. 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook). 
CONSENSUS: [AT]-x(l ,2)-[RK](2>-[GP]-R-G-R-P-[RK]-x. 

NAME: HMG14 and HMG17 signature. 
CONSENSUS: R-R-S-A-R-L-S-A-[RK]-P. 

NAME: Bromodomain signature. 
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CONSENSUS: [STANVF]-x(2)-F-x<4)-[DNS]-x(5,7)-[DENQTFJ-Y-[HFY]-x(2)-lLIVMFY]-x(3)- 
CONSENSUS: [LIVM]-x(4MLIVM3-x(6 t 8)-Y-x( 12, 1 3)-[LIVM]-x(2)-N-[SACF]-x(2MFY] . 

NAME: Bromodomain profile. 

NAME: Chromo domain signature. 

CONSENSUS: tFYL]-x-[LrYMC]-[KR]-W-x4GDNR]-[FWLE]-x(5 f 6)-[STl-W-[ES]-[PSTDN]-x(3)- 
CONSENSUS: [LIVMC]. 

NAME: Chromo and chromo shadow domain profile. 

NAME: Regulator of chromosome condensation (RCC1) signature 1 . 
CONSENSUS: G-x-N-D-x(2)-[AVJ-L-G-R-x-T. 

NAME: Regulator of chromosome condensation (RCCl) signature 2. 

CONSENSUS: [LIVMFA]-[STAGC](2)-G-x(2)-H-[STAGLI]-ELIVMFA]-x-[LIVM]. 

NAME: Protamine PI signature. 

CONSENSUS: [AV]-R-[NFY]-R-x(2,3MST]-x-S-x-S. 

NAME: Nuclear transition protein 1 signature. 
CONSENSUS: S-K-R-K-Y-R-K. 

NAME: Nuclear transition protein 2 signature 1 . 
CONSENSUS: H-x(3)-H-S-[NS]-S-x-P-Q-S. 

NAME: Nuclear transition protein 2 signature 2. 
CONSENSUS : K-x-R-K-x(2>E-G-K-x(2>K-[KR]-K. 

NAME: Ribosomal protein LI signature. 

CONSENSUS: pM]-x(2)-[LrVA]-x(2 t 3)-[LIVM]-G-x(2)-[LMS]-[GSNH]-[PTFai]-[KRAV].G-x- 
CONSENSUS: [LMF]-P-[DENSTK]. 

NAME: Ribosomal protein L2 signature. 

CONSENSUS: P-x(2)-R-G-[STAIV](2)-x-N-[APK)-x-[DE]. 

NAME: Ribosomal protein L3 signature. 

CONSENSUS: [FL]-x(6)-[DN]-xa)-[AGS]-x-[ST]-x-G-[KRH]-G-x(2)-G-x(3)-R. 
NAME: Ribosomal protein L5 signature. 

CONSENSUS: [LIVM] -x(2)-[LIVM]- [STAC]- [GEMQV]- x(2)-[LIVMA]-x-[STC]-x [STAG] -[KR]- 
CONSENSUS: x-[STA]. 

NAME: Ribosomal protein L6 signature 1 . 
CONSENSUS: fPS]-[DENS]-x-Y-K-[GA]-K-G-[LIVMJ. 

NAME: Ribosomal protein L6 signature 2. 

CONSENSUS: Q-x(3HLF/M]-x(2)-[KR]-x(2)-R-x-F-x-D^ 

NAME: Ribosomal protein L9 signature. 

CONSENSUS: G-x(2)-[GN]-x(4>V-x(2)^-[FY]-x(2)-N-[FY]-L-x(5)-[GA]-x(3)-[STN]. 
NAME: Ribosomal protein L10 signature. 

CONSENSUS: [DEH]-x(2)-[GS]-[LIVMF]-(STNl.[VAl-x-[DEQKl-[LIVMAJ-x(2)-[LIM]-R. 

NAME: Ribosomal protein Lll signature. 

CONSENSUS: [RKhH-x-[UVM]-x-G-[ST]-x(2HSNQHU^ 

NAME: Ribosomal protein LI 3 signature. 
CONSENSUS: [UVNq-[KRVHGK]-M-[LIV]-[PS^ 
CONSENSUS: [LFY]-x-[GDN]. 

NAME: Ribosomal protein L14 signature. 

CONSENSUS: [GA]-[LIV](3)-x(9 ( 10)-[DNS]-G-x(4)-[FYl-x(2)-tNT|-x(2)-V-[LIV]. 
NAME: Ribosomal protein LIS signature. 

CONSENSUS: K-{LIVM](2)-[GAL]-x-[GT|-x-[LIVMA].x(2,5)-[LIVM]-x-[LrVMF]-x(3,4)- 
CONSENSUS: [LIVMFC3-lST]-x(2)-A-x(3)-[LrVM3.x(3)-G. 

NAME: Ribosomal protein LI 6 signature 1 . 

CONSENSUS: [KR]-R-x-[GSAQ-[KQVA]-(UVM]-W-[LIVM]-[KR]-[L^ 

NAME: Ribosomal protein LI 6 signature 2. 
CONSENSUS: R-M-G-x-[GR]-K-G-x(4)-[FWKR]. 
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NAME: Ribosomal protein L17 signature. 

CONSENSUS: l-x4ST]-[GT|-x(2)-[KR)-x-K-x(6)-[DE]-x-tUMV]-[LIVMTJ-T-^ 
NAME: Ribosomal protein LI9 signature. 

CONSENSUS: [RT]-tKRSVY]-[GSA]-x-V-[RS]-[KR].tSA]-K-L-Y-Y-L-R. 
NAME: Ribosomal protein L20 signature. 

CONSENSUS: K-x(3)-[KRC]-x.[LrVM]-W-{IV]-[STNALV]-R-[LrVM]-N-xC3)-[RKH]. 
NAME: Ribosomal protein L21 signature. 

CONSENSUS: lIVT}-x(3>-lKR]-x(3)-[KRQ]-K-x(6)-G-[HF)-R-[RQ}-x(2>-T. 
NAME: Ribosomal protein L22 signature. 

CONSENSUS: [RKQN]-x(4)-[RH]-[GAS].x-G-[KRQS]-x(9)-[HDN]-[LIVM]-x-[LIVMSl-x-[LIVM]. 
NAME: Ribosomal protein L23 signature. 

CONSENSUS: [RK](2)^AM]-[IVFni-[IV]-[RKT]-L-tSTANQig-x(7)-[LIVMFT] 

NAME: Ribosomal protein L24 signature. 

CONSENSUS: [GDEhn-D-x-V-x-[^[IJVMA]-x^-x(2)-^ 

NAME: Ribosomal protein L27 signature. 
CONSENSUS: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G. 

NAME: Ribosomal protein L29 signature. 

CONSENSUS: tKNQS]4PSTL]-x(2)-[UMFA]-[KRGSAN]-x-[LIVYSTA]-[KR]-tKRH]-[DESTANRL]- 
CONSENSUS: [LIV3-A-[KRCQVT]-[LIVMA]. 

NAME: Ribosomal protein L30 signature. 

CONSENSUS: pVT]-[LIVM]-x(2)-[LF]-x-[LI]-x-[KRHQEG]-x(2)-[STNQH]-x-[IVT]- 
CONSENSUS: x(10)-[LMS]-[LIV]-x(2)-[LIVA]-x(2>-[LMFY)-[IVTl. 

NAME: Ribosomal protein L31 signature. 

CONSENSUS: H-P-F-[FY]-[TI]-x<9)^R-[AV]-x-[KR] . 

NAME: Ribosomal protein L33 signature. 

CONSENSUS: Y-x4ST]-x-[KR]-[NS]-x(4)-[PAT]-x(l,2)-[LIVM]-[EAl-x(2)-K-[F^-[CSDJ. 
NAME: Ribosomal protein L34 signature. 

CONSENSUS: K-[RG]-T-[FYWL]-EEQS]-x(5)-[KRHS}-x(4,5)-G.F-x(2)-R. 
NAME: Ribosomal protein L35 signature. 

CONSENSUS: [LIVM]-K-[TV]-x(2)-[GSA]-[SAIL]-x-K-R-[LrVMFY]-[KRL]. 

NAME: Ribosomal protein L36 signature. 
CONSENSUS: C-x(2)-C-x<2)-[UVM]-x-R-x(3)-[L^ 

NAME: Ribosomal protein Lie signature. 

CONSENSUS: N-x(3)-[KR]-x(2)-A-[LIVT]-x-S-A-[LIV]-x-A-[ST]-[SGA]-x(7)-ERK]-G-H. 

NAME: Ribosomal protein L6e signature. 

CONSENSUS: N-x(2)-P-L-R-R-x(4)-[FY].V.I-A-T-S-x-K. 

NAME: Ribosomal protein L7Ac signature. 

CONSENSUS: tCA}-x(4>[IV]-P-[FY]-x(2)-[LIVMl-x-[GSQ]-[KRQ]-x(2)-L-G. 

NAME: Ribosomal protein LIQe signature. 

CONSENSUS: R-x-A-[FYW]-G-K-[PA]-x-G-x(2)-A-R-V. 

NAME: Ribosomal protein L13e signature. 

CONSENSUS: [KRl-Y-x(2)-K-[UVM]-R-[STA]-G-[KR]-G-F-[STl-L-x-E. 
NAME: Ribosomal protein LISe signature. 

CONSENSUS: [DE]-[KR]-A-R-x-I^G-[ITl-x.ISAP]-x(2)^-[LlVMrnr](4)-R.x-R.V-x.R-G. 
NAME: Ribosomal protein L18e signature. 

CONSENSUS: [KM]-x-L-x(2>[PS]-[KR]-x(2)-[RH]-[PSA]-x4LIVM]-[NS]-[UVM]^^ 
CONSENSUS: [LIVM]. 

NAME: Ribosomal protein LI 9c signature. 

CONSENSUS: R-x-[KR]-x(5)-[IOl]-x(3)-[KRH]-x(2)-G-x-G-x-R-x-G-x(3)-A-R-x(3)-[KQ]- 
CONSENSUS: x(2)-W-x(7)-R-x(2)-L-x(3>-R. 
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NAME: Ribosomal protein L21e signature. 

CONSENSUS: G-[DE]-x-V-x(10)-[GV]-x(2)-fFYH]-x(2)-[FY]-x-G-x-T-G. 
NAME: Ribosomal protein L24e signature. 

CONSENSUS: [FY]-x-[GS]-x(2)-[IVl-x-P-G-x-G-x(2)-[FYV]-x-[KRHE]-x-D. 

NAME: Ribosomal protein L27e signature. 
CONSENSUS: G-K-N-x-W-F-F-x-K-L-R-F> . 

NAME: Ribosomal protein L30e signature 1. 

CONSENSUS: [STA]-x(5)-G-x-[QKR]-x(2)-[LIVMW^^ 

NAME: Ribosomal protein L30e signature 2. 

CONSENSUS: [DE]-L-G-[STA]-x(2)-G-[KR]-x(6>[LIVM]-x-(LIVM]-x-[DEN]-x-G. 
NAME: Ribosomal protein L31e signature. 

CONSENSUS: V-[KR]-[LIVM]-x(3)-[LIVM]-N-x-[AK]-x-W-x-[KR]-G. 
NAME: Ribosomal protein L32e signature. 

CONSENSUS: F-x-R-x(4)-[KR]-x(2>-lKR]-[LIVM]-x(3)-W-R-[KR]-x(2)-G. 

NAME: Ribosomal protein L34e signature. 
CONSENSUS: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G. 

NAME: Ribosomal protein L35Ae signature. 

CONSENSUS: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V.x-A-x-F-x(3)-ELI]-P. 

NAME: Ribosomal protein L36e signature. 

CONSENSUS: P-Y-E-[KR]-R-x-[LIVM]-[DE]-tLIVM](2)-[KR]. 

NAME: - Ribosomal protein L37e signature. 

CONSENSUS: G-T-x-lSA]-x-G-x-[KR]-x(3)-[ST]-x(0,l)-H-x(2)-C-x-R-C-G. 
NAME: Ribosomal protein L39e signature. 

CONSENSUS: tKRA]-T-x(3)-[LIVMJ-[KRQF]-x-CNHS]-x(3)-R-[NHY]-W-R-R. 

NAME: Ribosomal protein L44e signature. 
CONSENSUS: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C. 

NAME: Ribosomal protein S2 signature 1. 

CONSENSUS: [LIVMFA]-x(2)-[LIVMFYC](2)-x-[STACl-[GSTANQEKR]-[STALVJ-[HY]-[LIVMF]-G. 
NAME: Ribosomal protein S2 signature 2. 

CONSENSUS: P-x(2)-ILIVMFl(2)-[LIVMS3-x-[GDN]-x(3)-[DENL)-x(3)-[LrVM]-x-E-x(4)- 
CONSENSUS: [GNQKRH]-[LIVM]-[AP] . 

NAME: Ribosomal protein S3 signature. 

CONSENSUS: [GSTA]-[KR)-x(6)-G-x-[LIVMT]-x(2)-[NQSCH]-x(l,3>[LIVFCA]-x(3)-[LIV]- 
CONSENSUS: [DENQ]-x(7)-[LMT]-x(2)-G-x(2)-G. 

NAME: Ribosomal protein S4 signature. 

CONSENSUS: [LIVM]-[DE]-x-R-L-x(3)-[LIVMC]-[VMFYH^ 
CONSENSUS: [SAlHKR]-x-[LIVMF](2). 

NAME: Ribosomal protein SS signature. 

CONSENSUS: G-[KRQ]-x(3^[FY]-x-[AC^-x(2>-[UVMA]-[LIVM]-[AG]-[DN]-x(2H3-x- 
CONSENSUS: [LIVM]-G-x-tSAG]-x(5,6>-[DEQ]-[LIVM]-x(2)-A-[UVMF). 

NAME: Ribosomal protein S6 signature. 

CONSENSUS: G-x-[KRCHDENQRH]-L-[SA]-Y-x-I-[KRNSA]. 
NAME: Ribosomal protein S7 signature. 

CONSENSUS: [DENSK]-x-[LIVMETl-x(3)-[LiVMFri(2)-x(6)-G-K-tKR].x(5)-[LIVMF]-[LIVMFC]- 
CONSENSUS: x(2>[STA]. 

NAME: Ribosomal protein S8 signature. 

CONSENSUS: [GE]-x(2)-[LW](2)-[STY]-T-x(2Ki-[LIVM](2)-x(4V[AG]-[KRHAYI]. 
NAME: Ribosomal protein S9 signature. 

CONSENSUS: G-G43-x(2)-[GSA].Q-x(2MSA]-x(3)-[GSAl-x-[GSTAV]-[KR]-[GSALl-[LIF]. 
NAME: Ribosomal protein S10 signature. 

CONSENSUS: (AV]-x(3)-tGDNSR]-[LIVMSTA]-x(3)<I-P-[LIVM]-x-[LiVM]-P-T. 
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NAME: Ribosomal protein SI 1 signature. 

CONSENSUS: [LtVMF]-x4GSTAC]-[LIVMF]-x(2)-[GSTAL]-x(0.1)-[GSN]-[LIVMF]-x-[LIVM]- 
CONSENSUS: x(4)-[DEN]-x-T-P-x-[PAl-[STCH]-(DNJ. 

NAME: Ribosomal protein S12 signature. 
CONSENSUS: [RK]-x-P-N-S-[AR]-x-R. 

NAME: Ribosomal protein S13 signature. 

CONSENSUS: [KRQS]-G-x-R-H-x(2)-[GSNH]-x(2)-[LIVMC]-R-G-Q. 
NAME: Ribosomal protein S14 signature. 

CONSENSUS: [RP3-x(0.1K-x(U,12)-[UVMF]-x-[LIVMF]-[SC]-[RG]-x(3MRN]. 
NAME: Ribosomal protein S15 signature. 

CONSENSUS: [UVM]-x(2^H-[UVMFY]-x(5)-I>x(2).[SAGr^-x(3)-[U I J-x(9>-[UVM]^ 
CONSENSUS: [FY], 

NAME: Ribosomal protein S16 signature. 

CONSENSUS: [LIVMT]-x-[LIVM]-[KR]-L-[STAK]-R-x-G-[AKR]. 
NAME: Ribosomal protein S17 signature. 

CONSENSUS: G-D-x-[LIV]-x-[LIVA]-x-[QEKI-x-[RK]-P-[LIVl-S. 
NAME: Ribosomal protein S18 signature. 

CONSENSUS: [rV]-lDY]-Y-x(2)-[LIVMT]-x(2)-IUVM]-x(2HHY^-lLIVM]-[ST).[DERP].x- 
CONSENSUS: [GY]-K-[LIVM]-x(3)-R-[LIVMAS]. 

NAME: Ribosomal protein S19 signature. 

CONSENSUS: [STDNQ]-G-lKRQM]-x(6)-[LIVMl-x(4)-ELIVM]-tGSD]-x(2)-tLF]-[GAS]-[DE]-F- 
CONSENSUS: x(2)-[ST]. 

NAME: Ribosomal protein S21 signature. 

CONSENSUS: [DE]-x-A-tLY]-[KR]-R-F-K-[KR]-x(3V[KR] ( 

NAME: Ribosomal protein S3Ae signature. 

CONSENSUS: [LIV]-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L. 

NAME: Ribosomal protein S4e signature. 

CONSENSUS: H-x-K-R-[LIVMl-[SAN]-x-P-x(2)-W-x-ILIVM]-x-[KR]. 

NAME: Ribosomal protein S6e signature. 

CONSENSUS: [UVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M . 

NAME: Ribosomal protein S7e signature. 

CONSENSUS: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H. 

NAME: Ribosomal protein S8e signature. 

CONSENSUS: R-x(2>-T-G-[GA].x(5)-[HR]-K-[KR]-x-K-x*E-[LM]-G. 

NAME: Ribosomal protein S12e signature. 

CONSENSUS: A-I^[KRQP]-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L. 

NAME: Ribosomal protein S17e signature. 

CONSENSUS: A-x-I-x-[ST3-K-x-L-R-N-[KR]-I-A-G-[FY]-x-T-H. 
NAME: Ribosomal protein S19e signature. 

CONSENSUS: P-x(6)-[SAN]-x(2)-[LIVMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ]. 

NAME: Ribosomal protein S21e signature. 
CONSENSUS: L-Y-V-P-R-K-C-S-[SA]. 

NAME: Ribosomal protein S24e signature. 

CONSENSUS: (FA]-G-x(2)-[KR]4STA]-x-G-[r^-tGA]-x-[LrVM]-Y-IDN]-[SN]. 

NAME: Ribosomal protein S26e signature. 
CONSENSUS: [YH]-C-V-S<:-A-I-H. 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: [QK]-C-x(2)-C-x(6)-F-[GS]-x-[PSA]-x(5)-C-x(2)-C-[GS]-x(2).L-x(2)-P-x-G. 

NAME: Ribosomal protein S28e signature. 
CONSENSUS: E-[ST]-E-R-E-A-R-x-L. 

NAME: DNA mismatch repair proteins mutL / bexB / PMS 1 signature. 
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CONSENSUS: G-F-R-G-E-A-L. 

NAME: DNA mismatch repair proteins mutS family signature. 

CONSENSUS: [STI-tLIVMJ-x-[LIVM]-x-D-E-[LIVMY]-[GC3-[RKH]-G-[GST]-x(4)-G. 
NAME: mutT domain signature. 

CONSENSUS: G-x(5)-E-x(4)-[STAGCJ-[LrVMAC]-x-R-E-[LrVMFT]-x-E-E. 
NAME: DnaA protein signature. 

CONSENSUS: I-[GA]-x(2)-fLIVMF]-[SGDNK]-x(0,l)-[KR]-x-H-[STP]-[STV]-[LIVMl(2)-x- 
CONSENSUS: [SA]-x(2)-tKRE]-[LIVM]. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 1. 
CONSENSUS: K-x-E-[LIV]-A-x-[DE]-[LIVMF]-G-[LIVMFJ. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 2. 
CONSENSUS: [KRl-[SAQj-x-G-x-V-G-G-x-[LIVM]-x-[KR](2)-[LIVM](2). 

NAME: Zinc -containing alcohol dehydrogenases signature. 
CONSENSUS: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC] . 

NAME: Quinone oxidoreductase / zeta-cry stall in signature. 

CONSENSUS: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G.x(4)-Q-x(2)-[KR]. 

NAME: Iron-containing alcohol dehydrogenases signature 1, 

CONSENSUS: [STALIV].[UVF|-x-[DE]-x(6.7)-P-x(4)-[ALIV]-x-[GST]-x(2)-D-lTAIVM]. 
CONSENSUS: [LTVMF]-x(4)-E. 

NAME: Iron-containing alcohol dehydrogenases signature 2. 

CONSENSUS: fGSW]-x-tLIVTSACD]-[GH]-x(2)-[GSAEl-[GSHYQ3-x.[LIVTP]-[GASTJ-[GAS]-x(3)- 
CONSENSUS: [LIVMT]-x-[HNS]-[GA]-x-[GTAC] . 

NAME: Short-chain dehydrogenases/reductases family signature. 

CONSENSUS: [LIVSPADNK].x(12)-Y-[PSTAGNCV3-[STAGNQCIVM]-[STAGC3-K-{PC}-[SAGFR]- 
CONSENSUS: [LTVMSTAGD3-x(2)-[LIVMFYW]-x(3HLIVMFYWGAPTHQ3-[GSACQRHM3. 

NAME: Aldo/keto reductase family signature 1 . 

CONSENSUS: G-[FY}-R-tHSAL]-[LIVMFJ-D-[STAGq-[AS]-x(5)*E-x(2)-[LIVM3-G. 
NAME: Aldo/keto reductase family signature 2. 

CONSENSUS: [UVMFY]-x(9)-[KREQJ-x-[IJVM]-G-[LIVM]-[SC]-N-[FY3. 
NAME: Aldo/keto reductase family putative active site signature. 

CONSENSUS: [LrVM]-[PAIV]-[KR]-[ST]-x(4)-R-x(2>-[GSTAEQK3-[NSL]-x(2)-[LIVMFAl. 
NAME: Homo serine dehydrogenase signature. 

CONSENSUS: A-x(3HJ-[LIVMFYl-[STAG3-x(2,3).[DNS]-P.x<2)-D-[UVM]-x-G-x-I^x(3)-K. 
NAME: NAD-dependent glycerol-3-phosphate dehydrogenase signature. 

CONSENSUS: G-[AT]-[LIVM]-K-[DN]-[UVMJ(2)-A-x-[GA]-x.G-[LIVMFl-x-[DE3-G-[LIVM3-x- 
CONSENSUS: [LTVMFYW]-G-x-N. 

NAME: FAD-dependent glycerol -3-phosphate dehydrogenase signature 1 . 
CONSENSUS: [IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature 2. 
CONSENSUS: G-G-K-x(2)-[GSTE3-Y-R-x(2)-A. 

NAME: Mannitol dehydrogenases signature. 

CONSENSUS: [LIVMYl-x-[FS)-x(2)-[STAGCV]-x-V-D-R-[IV3-x-[PS3. 
NAME: Histidinol dehydrogenase signature. 

CONSENSUS: I-D-x(2)-A-G-P-[STJ-E-[LIVS]-[LIVMA3(3HAC)-x(3)-A-x(4)-[LIVM]-[AV]- 
CONSENSUS: [SACLl-[DE]-[UVMFCl-[LrVM]-[SA]-x(2)-E-H. 

NAME: L-lactate dehydrogenase active site. 
CONSENSUS: [LIVMA]-G-[EQ]-H-G-[DN]-[ST|. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases NAD-binding signature. 

CONSENSUS: [UVMAI-[AG]-aVT]-[LIVMFY3-[AGl-x-G-[^a^KRQGSAC3-[LIV]-G-x(13,14). 

CONSENSUS: [LIVfMT|-x(2>-[FYwCTH]-[DNSTK]. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 2. 
CONSENSUS: [LIVMFYWA]4LIVFYWC]<2HSAC3-PNQHR]-[IVF^ 
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CONSENSUS: P-x(4MSTN]-x(2MLlVMF]-x-[GSDN]. 

NAME: D-isomer specific 2-hydroxyackl dehydrogenases signature 3. 

CONSENSUS: [LMFATC]-[KlHM-x-[GSTDN]-x.[LIVMFYWRJ.[UVMFYW](2)-N-x-[STAGC]-R-[GP]-x- 
CONSENSUS: [LIVH]-[LIVMCMDNV]. 

NAME: 3-hydroxyisobutyrate dehydrogenase signature. 
CONSENSUS: [LIVMFY](2)-G-L-G-x-[MQ]-G-x-[PGS]-[MA]-[SA]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 1. 
CONSENSUS: (RKH]-x(6)-D-x-M-G-x-N-x-[LIVMA). 

NAME: Hydroxymethylglutaryl -coenzyme A reductases signature 2. 
CONSENSUS: [LIVM}-G-x-[LIVM]-G-G-[AG]-T. 

NAME: Hydroxymethylglutaryl -coenzyme A reductases signature 3. 

CONSENSUS: A-[LIVM]-x-[STANJ-x(2)-tLi]-x-[KRNQJ-[GSA]-H-[LM]-x-[FYLH]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile. 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature. 

CONSENSUS: [DNE]-x(2)-[GA]-F-[LIVMFYl-x.[NTJ-R-x(3HPA]-[LIVMFY](2)-x(5>- 
CONSENSUS: [LIVMFYCT|-[LIVMFY]-x(2).[GV]. 

NAME: Malate dehydrogenase active site signature. 

CONSENSUS: [LIVM]-T-[TRKMN]-L-D-x(2)-R-[STA]-x(3)-[LiVMFY]. 
NAME: Malic enzymes signature. 

CONSENSUS: F-x-[DV]-D-x(2)-G-T-[GSA)-x-[IV]-x-[LIVMA]-[GASTl(2)-[LIVMF](2). 
NAME: Isocitrate and isopropylmalate dehydrogenases signature. 

CONSENSUS: lNS]-[UMYT14FYDr^-G-[DNTl-[IMVY]-x-[STGDN]-[DN]-x(2>ESGAP]-x(3 t 4>-G- 
CONSENSUS: [STG]-[LIVMPA]-G-[UVMF]. 

NAME: 6-phosphogluconate dehydrogenase signature. 
CONSENSUS: [LIVM3-x-D-x(2>-[GA]-[NQS]-K-G-T-G-x-W. 

NAME: Glucose-6-phosphate dehydrogenase active site. 
CONSENSUS: D-H-Y-L-G-K-[EQK] . 

NAME: IMP dehydrogenase / GMP reductase signature. 

CONSENSUS: [LIVM]-[RK]-[LIVM]-G-lLIVM]-G-x-G-S-tUVM]-C-x-T. 

NAME: Bacterial quinoprotein dehydrogenases signature 1 . 
CONSENSUS: [DEN]-W-x(3>^-puq-x(6HFi^^ 

NAME: Bacterial quinoprotein dehydrogenases signature 2. 

CONSENSUS: W-x(4)-Y-D-x(3)-[DN]-[UVMFY](4)-x(2)-G-x(2)-[STA]-P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active site. 
CONSENSUS: S-N-H-G-[AG]-R-Q. 

NAME: GMC oxido reductases signature 1 . 

CONSENSUS: [GAHRKN>x-[UV]-G(2MGST}(2)-x-[LI^ 

CONSENSUS: [DNESH]. 

NAME: GMC oxidoreduccases signature 2. 

CONSENSUS: [GS]-[PSTA]-x(2)-[ST]-P-x-[LIVMl(2)-x(2)-S-G-[LIVM]-G. 
NAME: Eukaiyotic molybdopterin oxidoreductases signature. 

CONSENSUS: [GA]-x(3)-[KRNQHT]-x(ll f 14)-(UVMFWS]<8)-[LIVMR-x-C<2)-[DEN]-R- 
CONSENSUS: x(2)-[DE]. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 1 . 
CONSENSUS: [STAr^-x-[CH]-x(2,3K-[STAGHGS^ 
CONSENSUS: [DENQKHT]. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 2. 

CONSENSUS: [STAJ-x-[STAC](2)-x(2)-[STA]-D-tUVMY](2)-L-P-x-[STAC](2>-x(2)-E. 
NAME: Prokaryotic molybdopterin oxidoreductases signature 3. 

CONSENSUS: A-x(3)-[GDTI-I-x-[DNQTK]-x-tDEA]-x-tLIVM]-x-tLIVMC]-x-[NS]-x(2)-[GS}- 
CONSENSUS: x(5)-A-x-[LiVM]-[ST] . 
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NAME: Aldehyde dehydrogenases glutamic acid active site. 

CONSENSUS: [LIVMFGA]-E-[LIMSTAC]-[GS]-G-tKNLM]-[SADN].[TAPFV]. 

NAME: Aldehyde dehydrogenases cysteine active site. 

CONSENSUS: [FYLVA]-x(3)-G-[QE]-x-C-(LIVMGSTANC]-[AGCN]-x-[GSTADNEKR]. 

NAME: Aspartate-semialdehyde dehydrogenase signature. 

CONSENSUS: [l-IVM]-[SADNl-x(2)-C-x-R-[LIVM]-x(4>-lGSC]-H-[STA]. 

NAME: Glyceraldehyde 3-phosphate dehydrogenase active site. 
CONSENSUS: [ASV]-S-C-tNTJ-T-x(2)-[UM]. 

NAME: N-acetyl-ganuna-glutamyl-phosphate reductase active site. 

CONSENSUS: [LIVM]-[GSA]-x-P-G-C-[FY]-tAVP]-T-IGA]-x(3)-[GTAq-[LIVMl-x-P. 

NAME: Gamma-glutamyl phosphate reductase signature. 

CONSENSUS: V-x(5)-A-[LIV].x-H-I-x(2MHY]-fGS]-[STl-x-H-[STl-[DE]-x-I. 

NAME: Dihydrodipicolinate reductase signature. 
CONSENSUS: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A. 

NAME: Dihydroorotate dehydrogenase signature 1 . 

CONSENSUS: [GS]-x(4)-[GK]-[STA]-[IVSTA}-[GT]-x(3)-[NQR]-x-G-[NH)-x(2)-P-[RT]. 
NAME: Dihydroorotate dehydrogenase signature 2. 

CONSENSUS: [LIV](2)-[GSA]-x-G-G-[IV]-x-[STGN)-x(3)-tACV]-x(6)-G-A. 
NAME: Coproporphyrinogen III oxidase signature. 

CONSENSUS: K-x-W^-x(2)-tFYH](3)4UVM]-x-H-R-x-E-x-R-G-[LIVM]-G-G-[LJVM]-F-F-D. 

NAME: Fumarate reductase / succinate dehydrogenase FAD-binding site. 
CONSENSUS: R-[ST]-H-[ST]-x(2)-A-x-G-G. 

NAME: Acyl-CoA dehydrogenases signature 1 . 

CONSENSUS: [GAC]-[LIVM]-[ST]-E-x(2)-[GSAN]-G-[ST].D-x(2)-[GSA]. 
NAME: Acyl-CoA dehydrogenases signature 2. 

CONSENSUS: [QDE]-x(2)-G-[GS]-x-G-[LIVMFY]-x(2)-[DEN]-x(4)-[KR]-x(3)-[DEN). 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 1. 
CONSENSUS: G-[LrVM]-P-x-E-x(3)-N-E-x(l,3)-R-V-A-x-[ST]-P-x-lGSTl-V-x(2)-L-x-[KRH]- 
CONSENSUS: x-G. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 2. 
CONSENSUS: [LIVMl(2)-G-[GA]-G-x-A-G-x(2)-[SA]-x(3)-[GA)-x-[SG]-[LIVM]-G-A-x-V- 
CONSENSUS: x(3)-D. 

NAME: Glu / Leu / Phe / Val dehydrogenases active site. 
CONSENSUS: [LIV].x(2)-G-G-[SAG]-K-x-[GVl-x(3)-[DNST]-[PL]. 

NAME: D-amino acid oxidases signature. 

CONSENSUS: [LIVM](2)-H-tNHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A. 

NAME: Pyridoxamirte 5' -phosphate oxidase signature. 
CONSENSUS: [UVFl-E-F-W-[QHG]-x(4)-R-[LIVM).H.[DNE]-R. 

NAME: Copper amine oxidase topaquinone signature. 

CONSENSUS: [LIYMl-[LrVMA]-[LIVM]-x(4)-T-x(2)-N-Y-[DE]-[YN]. 

NAME: Copper amine oxidase copper-binding site signature. 
CONSENSUS: T-x-G-x(2)-H-[LIVMF]-x(3).E-PE3-x-P. 

NAME: Lysyl oxidase putative copper-binding region signature. 
CONSENSUS: W-E-W-H-S-C-H-Q-H-Y-H. 

NAME: Delta 1 -pyrrol ine-5-carboxy late reductase signature. 

CONSENSUS: [PALF)-x(2 t 3)-[LIV]-x(3)-[UVM].[STAC]4STV]-x-[GAN]-G-x-T-x(2)-[AG]- 
CONSENSUS: [LIV]-x(2)-[LMFHDENQK] . 

NAME: Dihydrofolate reductase signature. 
CONSENSUS: [LVAGC)-[Ufl-G-x(4MLIVMF]-P^^ 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 1. 
CONSENSUS: [E^x-[EQ]q-[UVM](2)-x<2)-[lJVM]-x(2>^ 
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CONSENSUS: Q-L-P-[LV). 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 2. 
CONSENSUS: P-G-G-V-G-P-[MF]-T-[IV]. 

NAME: Oxygen oxidoreductases covalent FAD-binding site. 

CONSENSUS: P-x(10>[DE]-[LIVM]-x(3)-[LIVM3-x(9)-[LrVM]-x(3)-[GSA]-[GSTI-G-H. 

NAME: Pyridine nuclcotidc-di sulphide oxidoreductases class-I active site. 
CONSENSUS: G-G-x-C-[LIVA]-x(2)-G-C-[LlVM]-P. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class- II active site. 
CONSENSUS: C-x(2)-C-D-[GA]-x(2,4).[FYJ-x(4)-[LIVM]-x-[UVM](2)-G(3)-[DN3. 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 1. 

CONSENSUS: G-[LIVMI^KRS]-tLrVMAGP]<!-x-tLIVMFY]-x-D-[AGIM]-[LIVMFTA]-K-[LVMySTI- 
CONSENSUS: [LIVMFYG]-x-[KR]-[EQG] . 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 2. 

CONSENSUS: P-F-D-[LIVMFYQ]-[STAGPVM]-E-[GAC]-E-x-[EQ]-[LIVMS)-x(2)-G. 

NAME: Respiratory-chain NADH dehydrogenase 20 Kd subunit signature. 

CONSENSUS: [GN]-x-D-[KRSTI-[LIVMF3(2)-P-tIV]-D-[LiVMFYW](2)-x-P-x-C-P-[PTl. 

NAME: Respiratory-chain NADH dehydrogenase 24 Kd subunit signature. 
CONSENSUS: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2)-[GA]-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit signature. 

CONSENSUS: E-R-E-x(2)-[DE]-[LIVMF](2)-x(6)-[HK]-x(3)-[KRP]-x-[LIVM]-[UVMS]. 

NAME: Respiratory chain NADH dehydrogenase 49 Kd subunit signature. 
CONSENSUS: [LIVMH]-H-[RT]-[GA]-x-E-K-tLIVMTl-x-E-x-[KRQ]. 

NAME: Respiratory -chain NADH dehydrogenase 51 Kd subunit signature 1. 
CONSENSUS: G-[AM]-G-[AR]-Y-[LIVM]-C-G-tDE](2)-tSTA](2)-[LIM](2)-(EN]-S. 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit signature 2. 
CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G. 

NAME: Respiratory -chain NADH dehydrogenase 75 Kd subunit signature 1 . 
CONSENSUS: P-x(2)-C-[YWS]-x(7)-G-x-C-R-x-C. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 2. 
CONSENSUS: C-P-x-C-[DE]-x-[GS](2)-x-C-x-L-Q. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 3. 
CONSENSUS: R-C-[UVM]-x-C-x-R-C-[LIVM]-x-[FY]. 

NAME: Nitrite and sulfite reductases iron-sulfur/siroheme-binding site. 
CONSENSUS: [STV]-G-C-x(3K-x(6)-[DEJ-[LIVMF)-[GATI-[LIVMF3 . 

NAME: Unease signature. 

CONSENSUS: L-x-[LV]-L-K-tST]-T-x-S-x-F-x(2HFY]-x(4)-[FY]. 

NAME: Heme-copper oxidase catalytic subunit, copper B binding region signature. 
CONSENSUS: [YWG]-[LIVFYWTA](2)-[VGS]-H-[LNP]-x-V-x(44 t 47)-H-H. 

NAME: CO II and nitrous oxide reductase dinuclear copper centers signature. 
CONSENSUS: V-x-H-x(33,40>-C-x(3)-C-x(3)-H-x(2)-M. 

NAME: Cytochrome c oxidase subunit Vb, zinc binding region signature. 
CONSENSUS: [LIVM](2V[FYW]-x(10)-C-x(2)-C-G-x(2)-[FY]-K-L. 

NAME: Multicopper oxidases signature 1 . 

CONSENSUS: G-x4FYW]-x-[LIVMFYWl-x-[CSTl-x(8)-G-[LM]-x(3)-[LIVMFYWl. 

NAME: Multicopper oxidases signature 2. 
CONSENSUS: H-C-H-x(3)-H-x(3)-[AG]-[LM]. 

NAME: Peroxidases proximal heme-ligand signature. 
CONSENSUS: [DET]-rUVMTA]-x(2)-[LI\^-[L^ 

NAME: Peroxidases active site signature. 

CONSENSUS: [SGATV]-x(3>-(LrVMA]-R-[LIVMA]-x-(FW]-H-x-[SAC]. 
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NAME: Catalase proximal heme-ligand signature. 

CONSENSUS: R-[LIVMFSTAN3-F-[GASTNP]-Y-x-D-[ASTl-lQEH] . 

NAME: Catalase proximal active site signature. 

CONSENSUS: tIFJ-x-[RH]-x(4)-[EQ]-R-x(2)-H-x(2)-[GAS]-[GASTF]-[GAST|. 
NAME: Glutathione peroxidases selenocysteine active site. 

CONSENSUS: [GN^[RKHNFYCJ-x•[LIVMFC]-[LIVMF](2)-x-N-[VT)-x-[STC)-x-C-[GA]-x-T. 

NAME: Glutathione peroxidases signature 2. 
CONSENSUS: [LIV]-[AGD]-F-P-[CS]-[NG]-Q-F. 

NAME: Lipoxygenases iron-binding region signature 1 . 

CONSENSUS: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3>-E. 

NAME: Lipoxygenases iron-binding region signature 2. 

CONSENSUS: [UVMAJ-H-P-[LIVM]-x-[KRQ]-[LIVMF](2)-x-[AP]-H. 

NAME: Extradiol ring-cleavage di oxygenases signature. 

CONSENSUS: [GNTIV]-x-H-x(5.7)-[LIVMF]-Y-x(2)-[DENTAJ-P-x-[GPJ-x(2 ? 3)-E. 
NAME: Intradiol ring-cleavage di oxygenases signature. 

CONSENSUS: [LlVM]-x-G-x-[LIVM]-x(4)-[GS]-x(2)-[LIVM]-x(4)-[LIVM]-[DE]-[LIVMFYJ- 
CONSENSUS: x(6)-G-x-[FYJ . 

NAME: Indoleamine 2,3-dioxygenase signature 1. 
CONSENSUS: G-G-S-[AN]-[GA]-Q-S-S-x(2)-Q. 

NAME: Indoleamine 2,3-dioxygenase signature 2. 

CONSENSUS: [FY]-L-[DQ]-[DE]-[LIVM]-x(2)-Y-M-x(3)-H-tKR]. 

NAME: Bacterial ring hydroxylating dtoxygenases alpha-subunit signature. 
CONSENSUS: C-x-H-R-[GA]-x<8)-G-N-x(5)-C-x-[FY]-H. 

NAME: Bacterial luciferase subunits signature. 

CONSENSUS: [GA]-[UVM]-P-[LIVM]-x-tLIVMFY]-x-W-x(6)-[RK]-x(6)-Y-x(3)-[AR]. 

NAME: ubiH/COQ6 monooxygenase family signature. 
CONSENSUS: H-P-[LrV]-[AG]-G-Q-G-x-N-x-G-x(2)-D. 

NAME: Bio pterin -de pendent aromatic amino acid hydroxylases signature. 
CONSENSUS: P-D-x(2)-H-[DE]-[LI]-[LIVMF]-G-H-{LIVMC]-P. 

NAME: Copper type II, ascorbate -dependent monooxygenase s signature 1 . 
CONSENSUS: H-H-M-x(2)-F-x-C. 

NAME: Copper type II, ascorbate Hie pendent monooxygenase s signature 2, 
CONSENSUS: H-x-F-x(4)-H-T-H-x(2)-G. 

NAME: Tyrosinase CuA- binding region signature. 

CONSENSUS: H-x(4,5)-F-[LIVMFTP]-x-[FW]-H-R-x(2)-[LM]-x(3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region signature. 
CONSENSUS: D-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D. 

NAME: Fatty acid desatu rases family 1 signature. 
CONSENSUS: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y. 

NAME: Fatty acid desaturases family 2 signature. 

CONSENSUS: [ST}-[SAJ-x(3)4QR)-[LI]-x(5,6)-I>Y<2)-[LIVMFYW]-[LIVM]-[DEl. 

NAME: Cytochrome P450 cysteine heme-iron ligand signature. 
CONSENSUS: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-[GAD]. 

NAME: Heme oxygenase signature. 
CONSENSUS: L-L-V-A-H-A-Y-T-R. 

NAME: Copper/Zinc superoxide dismutase signature 1. 

CONSENSUS: [GA]-tIFAT]-H-[LIVF].H-x(2HGP]-[SDG]-x-[STAGD]. 

NAME: Copper/Zinc superoxide dismutase signature 2. 
CONSENSUS: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2)-[TV] . 
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NAME: Manganese and iron superoxide dismutases signature. 
CONSENSUS: D-x-W-E-H-[STA]-[FY]<2). 

NAME: Ribonucleotide reductase large subunit signature. 

CONSENSUS: W-x(2)-[LF]-x(6,7)-G-[LIVM]-(FYRA]-[NH]-x(3)-[STAQLIVM]-[ASC]-x(2>- 
CONSENSUS: [PA]. 

NAME: Ribonucleotide reductase small subunit signature. 

CONSENSUS: [IVMSEQ]-E-x(l t 2>-[LIVTA]-[HY]-[GSA]-x-[STAVM]-Y-x(2)-[UVMQ]-x(3)- 
CONSENSUS: [LIFY]-[IVFYCSA]. 

NAME: Nitrogenases component 1 alpha and beta subunits signature 1. 
CONSENSUS: [LIVMFYH]-[LiVMFST]-H-[AG].[AGSP]-[LIVMNQA]-[AG]-C. 

NAME: Nitrogenases component 1 alpha and beta subunits signature 2. 

CONSENSUS: [STANQ]-[ETl.C-x(5>G-D-[DNl-[LIVMT]-x-[STAGR]-[LIVMFVST]. 

NAME: NifH/frxC family signature 1. 

CONSENSUS: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG]-G. 

NAME: NifH/frxC family signature 2. 

CONSENSUS: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P. 

NAME: Nickel-dependent hydrogenases large subunit signature 1 . 
CONSENSUS: R-G-[LIVMF]-E-x(15)-[QESM]-R-x-C-G-[LIVM]-C. 

NAME: Nickel-dependent hydrogenases large subunit signature 2. 
CONSENSUS: lFY]-EM>-C-[LIM]-[ASG]-C-x(2,3)-H. 

NAME: Glutamyl-tRNA reductase signature. 

CONSENSUS: H-[LIVM]-x(2)-tLIVM]-[GSTAC](3>-[UVM]-[DEQ]-S-[LIVMA]-[LIVM](2)-[GF].E- 
CONSENSUS: x-[QR]-[IV3.[Lm*[STAG]-Q-[UVM]-[KR]. 

NAME: Bacterial-type phytoene dehydrogenase signature. 

CONSENSUS: [NG]-x-[FYWV]-[LIVMF]-x-G-[AGC]-[GS]-[TA]-[HQTl-P-G-ESTAV)-G-[LIVM]- 
CONSENSUS: x(5)-[GS]. 

NAME: Glycine radical signature. 

CONSENSUS: [STIV]-x-R-[IVTHCSA]-G-Y-x-[GACV]. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature I . 
CONSENSUS: G-x(2)-[LIVM]-Y-D-x-[FY]-x-G-x(2)-L-N-P-R. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 2. 
CONSENSUS: [UVM](2)-H-R-x<2)-R-D-x(3)-C-x(2)-K-Y-G. 

NAME: NNMT/PNMT/TEMT family of me thy I transferases signature. 
CONSENSUS: L-I-D-I-G-S-G-P-T-[IV)-Y-Q-L-L-S-A-C. 

NAME: RNA methyl transferase trmA family signature 1 . 

CONSENSUS: [DN]-P-[PAl-R-x-G-x(14,16)-[LiVM](2)-Y-x-S-C-N-x(2)-T. 

NAME: RNA methyltransferase trmA family signature 2. 
CONSENSUS: [LIVMF]-D-x-F-P-[QHY]-[ST]-x-H-lLIVMFYl-E. 

NAME: Thymidylate synthase active site. 

CONSENSUS: R-x<2)-[LIVM]-x(3)-[FWHQ>n-x(8W 

CONSENSUS: x-[LV]. 

NAME: Ribosomal RNA adenine dimethylases signature. 
CONSENSUS: [UVM]-[UVMFi>[DE]-x-G-[STAPV]-G^ 
CONSENSUS: x(6)-[LIVMYl-x-[STAGVl-(LIVMFYHC].E-x-D. 

NAME: Methylated-DNA-protein-cysteine methyltransferase active site. 
CONSENSUS: [IJVMF]-P-C-H-R-[UVMF](2). 

NAME: N-6 Ade nine-specific DNA methylases signature. 
CONSENSUS: [UVMAC]-[LIVFYWA]-x-PN]-P-P-[FYW). 

NAME: N-4 cy to sine -specific DNA methylases signature. 
CONSENSUS: [UVMF]-T-S-P-P-[FY], 

NAME: C-5 cytosine-specific DNA methylases active site. 

CONSENSUS: [DENKS]-x-[FLlV]-x(2)-[GSTq-x-P-C-x(2)-[FYWLIM]-S. 
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NAME: C-5 cytosine-specific DNA methylases C-terminal signature. 

CONSENSUS: [RKQGTF]<2)-G-N-[STAG]-[LIVMf^-x(3)4LIVNni-x(3)-[LIVM]-x(3)-[LIVM]. 

NAME: Protcin-L-isoaspartate(D-aspartate) O- methyl transferase signature. 
CONSENSUS: [GSA]-D-G-x(2)-G-[FYWV]-x(3V[AS]-P-[FY]-[DN]-x-I. 

NAME: Uroporphyrin-III C-methyltransferase signature 1. 

CONSENSUS: [LIVM]-[GS]-[STAL]^-P-G-x(3)-[UVMFY]-[LIVM]-T-[LIVM3-tKRHQGJ-[A^^ 
NAME: Uroporphyrin-III C-methyltransferase signature 2. 

CONSENSUS: V-x(2)-[U]-x(2>G-D-x(3)-[FYW]-[GSJ-x(8).[UVF]-x(5 t 6)-[LIVMFYWPAC]. 
CONSENSUS: x-[LIVMY]-x-P-G. 

NAME: ubiE/COQS methyl transferase family signature 1 . 
CONSENSUS: Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W. 

NAME: ubiE/COQ5 methyltransferase family signature 2. 

CONSENSUS: R-V-[LIVMJ-K-[PV]-G-G-x-[LrVMF]-x(2)-ILIVM]-E-x-S. 

NAME: Serine hydroxy methyltransferase pyridoxal-phosphate attachment site. 

CONSENSUS: [DEH]-[LIVMFY]-x-[STMV]-[GSTJ-[STI(2)-H-K-[ST]-[LFl-x-G-[PACl-[RQ]- 

CONSENSUS: [GSA]-[GA]. 

NAME: Phosphoribosylglycinamide formyl transferase active site. 

CONSENSUS: G-x-(S™]-[IVT]-x-[FYWVQ]-IVMAT].x-[DEVM]-x-[LIVMY]-D.x-G-x(2)-[LIVTI- 
CONSENSUS: x(6)-[LIVM]. 

NAME: Aspartate and ornithine carbamoyltransferases signature. 
CONSENSUS: F-x-[EK]-x-S-[GT]-R-T. 

NAME: Transketolase signature 1. 

CONSENSUS: R-x(3)-[LIVMTA]-[DENQSTHKF]-x(5 t 6)-[GSN]-G-H-IPLrVMF].[GSTA]-x(2)- 
CONSENSUS: [LIMC]-[GS]. 

NAME: Transketolase signature 2. 

CONSENSUS: G-[DEQGSA]-[D^-G-[PAEQ]-[STl-rHQ]-x-[PAGM]-[LIVMYACl-[DEFYWl-x(2)- 
CONSENSUS: [STAP]-x(2)-[RGA]. 

NAME: Transaldolase signature 1. 

CONSENSUS: [DG1-[IVSA]-T-[ST]-N-P-[STA]-[LIVMF1(2). 
NAME: Transaldolase active site. 

CONSENSUS: [UVM]-x-[LIVM]-K-[LIVM]-PASl-x-[STl-x-[DENQPAS]-G-[LIVM3-x-[AGVl-x- 
CONSENSUS: [QEKRSTJ-x-[LIVM]. 

NAME: Acyltransferases ChoActase / COT / CPT family signature 1. 
CONSENSUS: [Ln-P-x-[LVP3-P-[IVTAJ-P-x-[LIVM]-x-^^^ 

NAME: Acyltransferases ChoActase / COT / CPT family signature 2. 

CONSENSUS: R-[FYWl-x-(DA]-[KA]-x(0.1)-[LIVMFY]-x-[LIVMFY](2)-x(3)-[DNS]-[GSA]-x(6)- 
CONSENSUS: [DE]-[HS]-x(3)-[DE]-[GA]. 

NAME: Thiolases acyl-enzyme intermediate signature. 

CONSENSUS: [UVM]-[NST|-x<2)-C-[SAGU]-[ST)-[SAG]-[LIVMFYNS]-x-[STAG)-[LIVM]-x(6)- 
CONSENSUS: [LIVM]. 

NAME: Thiolases signature 2. 

CONSENSUS: N-x(2)-G-G-x-[LIVM3-[SA]-x-G-H-P-x-G-x-[STl-G. 
NAME: Thiolases active site. 

CONSENSUS: [AG}-[LIVMA].[STAGLIVM]-[STAG]-[LIVMA]-C-x-[AG]-x-[AG]-x-[AG]-x-tSAG]. 

NAME: Chloramphenicol acetyl transferase active site* 
CONSENSUS: Q-[LIVl-H.H-[SAl-x(2)-D-G-[FY]-H. 

NAME: Hexapepude-repeat containing-transferases signature. 

CONSENSUS: [LIV3-tGAED3-x(2)-[STAV3-x-[LIV]-x(3)-[LIVAC]-x-[LIV]-lGAED3-x(2)- 
CONSENSUS: (STAVR].x-lLIVl-[GAED]-x(2)-[STAV3-x-tLrV]-x(3>-rLIV]. 

NAME: Beta-ketoacyl synthases active site. 

CONSENSUS: G-x(4>[IJVMFAP]-x(2MAGC]-C-[STA3^ 

NAME: Chalcone and stilbene synthases active site. 
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CONSENSUS: R-[LIVMFYS]-x-[LIVM]-x-EQHG3-x-G-C-[FYNA]-[GA]-G-[GA]-[STAV]-x-[LIVMF]* 
CONSENSUS: [RA]. 

NAME: Myristoy]-CoA: protein N-myristoyltransferase signature 1 . 
CONSENSUS: E-I-N-F-L-C-x-H-K. 

NAME: Myristoyl-CoA: protein N-myristoyltransferase signature 2. 
CONSENSUS: K-F-G-x-G-D-G. 

NAME: Gamma-glutamy (transpeptidase signature. 

CONSENSUS: T-[STA]-H-x-[ST|-[LIVMA]-x(4)-G-[SN]-x-V-[STA3-x.T-x.T-[LIVM]-[NE]- 
CONSENSUS: x(l,2MFY]-G. 

NAME: Transglutaminases active site. 

CONSENSUS: tGT]-Q-[CA}-W-V-x-[SA]-[GA]-[IVT].x(2)-T-x-[LMSC]-R-lCSA3-[LV]-G. 

NAME: Phosphoiylase py rid oxal -phosphate attachment site. 
CONSENSUS: E-A-[SC]-G-x-[GS]-x-M-K-x(2)-fLM]-N. 

NAME: UDP-glycosyltransferases signature. 
CONSENSUS: [W]-x(2)^x(2)-[LIVMYA]-[LIMV]^ 

CONSENSUS: [HNQ]-[STAGC]-G-x(2)-[STAG]-x(3)-[STAGL]-[LIVMFA]-x(4)-[PQRJ-[LrVMT]- 
CONSENSUS: x{3)-[PA]-x(3MDESHQEHN]. 

NAME: Purine/pyrimidine phosphoribosyl transferases signature. 

CONSENSUS: [LIVMFYWCTA]-tLIVM]-[LIVMA]-[LiVMFC]-[DE].D-[UVMS]-[LIVM]-[STAVD]. 
CONSENSUS: [STAR]-[GAC]-x-[STAR]. 

NAME: Glutamine amidotransfe rases class-I active site. 
CONSENSUS: [PASHLIVMFni-[LIVMr^-G-[LIVMFYK4 

NAME: Glutamine amidotransfcrases class-II active site. 
CONSENSUS: < x(0,l l>C-[GS]-[IV]-[LIVMFyW]-[AG]. 

NAME: Purine and other phosphorylases family 1 signature. 
CONSENSUS: [GSTl-x-G-[LrVM]-G-x-[PA]-S-x-[GSTA]-I-x(3)-E-L. 

NAME: Purine and other phosphorylases family 2 signature. 
CONSENSUS: [UV]-x(3)-G-x(2)-H-x-[LIVMinn^ 

CONSENSUS: [ATV3-x(4)-[GNJ-x(3,4)-[LIVMF](2)-x(2)-lSTN]-[SA3-x-G-[GS]-[LIVM]. 

NAME: Thymidine and py rim idine -nucleoside phosphorylases signature. 
CONSENSUS: S-[GS]-R-[GA]-[LIV]-x(2)-[TA]-[GA3-G-T-x-D-x-[UVl-E. 

NAME: ATP phosphoribosyl transferase signature. 

CONSENSUS: E-x(5)-G-x-[SAG3-x(2)-[IVl-x-D-[LIV3-x(2)-[STl-G-x-T-[LM]. 

NAME: NADcarginine AD P-ribosy I transferases signature. 
CONSENSUS: [FY]-x-[FY]-K-x(2)-H-[FYJ-x-L-[ST]-x-A. 

NAME: Prolipoprotein diacy [glyceryl transferase signature. 
CONSENSUS: G-R-x-[GA]-N-F-[LIVMFJ-N-x-E-x(2)-G. 

NAME: S-adenosylmethionine synthetase signature 1. 
CONSENSUS: G-A-G-D-Q-G-x(3)-G-Y. 

NAME: S-adenosylmethionine synthetase signature 2. 
CONSENSUS: G-[GAJ-G-[ASCJ-F-S-x-K-[DEJ. 

NAME: Polyprenyt synthetases signature 1 . 

CONSENSUS: [LIVM3(2)-x-D-D-x(2,4)-D-x(4)-R-R-[GHl. 

NAME: Polyprenyl synthetases signature 2. 

CONSENSUS: [LIVMFY]-G-x(2)-tFYL]-Q-[LIVM3-x-D-D-ILIVMFY3-x-[DNG] . 
NAME: Squalene and phytoene synthases signature 1. 

CONSENSUS: Y-tCSAM3-x(2)-[VSG)-A-[GSA3-tLIVAT]-[IVl-G-x(2)-[LMSCJ-x(2)-[LIV]. 
NAME: Squalene and phytoene synthases signature 2. 

CONSENSUS: [LIVM]-G-x(3)-Q-x<2 > 3)-N-[IFJ-x-R-D-[LIVMFY3-x(2)-[DE3-x(4,7).R-x-[FYl- 
CONSENSUS: x-P. 

NAME: Protein prenyltransferases alpha subunit repeat signature. 

CONSENSUS: [PSIAV3-x-[NDFV3-[NEQlYl-x-[LIVMAGP3.W.[NQSTHI^-tFmQ3-[LIVMRl. 
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NAME: Riboflavin synthase alpha chain family signature. 

CONSENSUS: [UVMFl*x(5>-G-lSTADNQ]*[KREQIYW]-V-N.tLIVM]-E. 

NAME: Dihydropteroate synthase signature 1 . 

CONSENSUS: [LIVM]-x-[AG]-[LIVMFl(2)-N-x-T-x-D-S-F-x-D-x-[SG]. 
NAME: Dihydropteroate synthase signature 2. 

CONSENSUS: [GE)-[SA)-x-tLI\^](2>D-[UVM]-G-[GP]-x(2)-[STA]-x*P. 
NAME: EPSP synthase signature 1. 

CONSENSUS: [LIVM]-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMYl-x-[GSTA]. 
NAME: EPSP synthase signature 2. 

CONSENSUS: [KR]-x-[KH].E-[CSTJ-[DNE}-R-[UVM]-x-[STA]-[LTVMC]-x(2)-[ENl-[LIVMF]-x- 
CONSENSUS: [KRA]-[LIVMF|-G. 

NAME: FLAP/GST2/LTC4S family signature. 
CONSENSUS: G-x(3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C. 

NAME: Aminotransferases class-I pyridoxal-phosphate attachment site. 

CONSENSUS: [GS]-[UVMFYTAC]-[GSTA]-K-x(2)-[GSALVhn-tLIVMFA3-x.(GNAR]-x-R-[LIVMA]- 
CONSENSUS: [GA]. 

NAME: Aminotransferases class- II pyridoxal-phosphate attachment site. 

CONSENSUS: T-[LIVMFYW]-[STAG]-K-[SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG]. 

NAME: Aminotransferases class-in pyridoxal-phosphate attachment site. 

CONSENSUS: [LIVMFYWC](2)-x-D-E-[UVMA]-x(2HGP]-x(0,l)-[LIVMFYWAG]-x(0 t l)-[SACR]-x- 
CONSENSUS: [GSAD]-x(12.16)-D-[LIVMFYWC]-x(2,3)-[GSA]-K-x(3)-[GSTADN]-[GSA]. 

NAME: Aminotransferases class-IV signature. 

CONSENSUS: E-x-[STAGCI]-x(2)-N-tLlVMFAC3-[FY]-x<6,12)-[LIVMF]-x-T-x(6,8)-[LIVM]-x- 
CONSENSUS: [GS]-[LIVM]-x-[KR] . 

NAME: Aminotransferases class- V pyridoxal-phosphate attachment site. 

CONSENSUS: [UVFYCHTJ-[DGH]-[LIVMFYAC]-[LIVMFYA]-x(2HGSTAC]-[GSTA]-[HQR]-K- 
CONSENSUS: x(4,6)-G-x-[GSAT)-x-[LIVMFYSAC]. 

NAME: Hexokinases signature. 

CONSENSUS: [LIVM}-G-F-n>I]-F-S-[Fn-P-x(5V^ 

CONSENSUS: [LFJ. 

NAME: Galactokinase signature. 

CONSENSUS: G-R-x-N-[LIV]-I-G-E-H-x-D-Y. 

NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: [LIVM]-[PK]-x-[GSTA]-x(0.1)-G-L-[GSl-S-S-[GSA]-[GSTAC]. 

NAME: Phosphofructokinase signature. 

CONSENSUS: [RK]-x(4)-G-H-x-Q-lQR]-G-G-x(5)-D-R. 

NAME: pfkB family of carbohydrate kinases signature 1 . 
CONSENSUS: [AG]-G-x(0.1)-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G. 

NAME: pfkB family of carbohydrate kinases signature 2. 

CONSENSUS: PNSK]-[PSTV3-x-[SAG](2)-[GD]-D-x(3)-[SAGV]-[AGHLIVMFY]-[LIVMSTAP]. 
NAME: ROK family signature. 

CONSENSUS: 0LIVM]-x(2)-G-rLIVMFCT]-G-x-[GA3-tLIVMFA)-x(8)-G-x{3,5)-[GATP]-x{2)- 
CONSENSUS: G-[RKH], 

NAME: Phosphoribulokinase signature. 

CONSENSUS: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x-E. 

NAME: Thymidine kinase cellular-type signature. 

CONSENSUS: [GA]-x(l,2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-[LIVMFYWHl. 
NAME: FGGY family of carbohydrate kinases signature 1. 

CONSENSUS: [MFYGS]-x-[PSTI-x(2)-K-[LrVMFYW]-x-W-[lJVMin-x-[DENQTKR]-[ENQH]. 
NAME: FGGY family of carbohydrate kinases signature 2. 

CONSENSUS: [GSAJ-x-lLrVMFYV^-x-G-fLrVMl^T.SJ-tHDENQl-lLIVMFl-xaJ-fASJ-lSTAJVM]- 
CONSENSUS: [UVMFY]-[DEQJ . 
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NAME: Protein kinases ATP-binding region signature. 
CONSENSUS: [LIV]-G-{P}^-{PHFYWMGSTNHMSGA]-{PW 
CONSENSUS: x(5, 18>[UVMFYWCSTAR]-[AIVP]-tLIVMFAGCKR]-K. 

NAME: Serine/Threonine protein kinases active-site signature. 

CONSENSUS: [LIVMFYC]-x-[HYl-x-D-[UVMFY]-K-x(2)-N-[LIVMFYCTl(3). 

NAME: Tyrosine protein kinases specific active-site signature. 

CONSENSUS: [LlVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTACJ-x(2).N-[UVMFYC](3). 

NAME: Protein kinase domain profile. 

NAME: Casein kinase II regulatory subunit signature. 

CONSENSUS: C-P-x-[LIVMY]-x-C-x(5)-L-P-[LIVMCJ-G-x(9)-V-[KR]-xC2K:-P-x-C. 
NAME: Pyruvate kinase active site signature. 

CONSENSUS: [LIVACl-x-[UVM3(2)-[SAPCV]-K-[LIV]-E-[NKRSTl-x-[DEQH].[GSTA]-[LIVM]. 
NAME: Shikimate kinase signature. 

CONSENSUS: [KR]-x(2>-E-x(3)-[LIVMF]-x(8J2)-[LIVMFl(2)-[SA]-x-G(3>-x-[LIVMF]. 

NAME: Prokaryotic diacylglycerol kinase signature. 
CONSENSUS: E-x-[UVM]-N-[ST)-[SA]-[LIV]-E-x(2)-V-D. 

NAME: Phosphatidylinositol 3- and 4 -kinases signature 1 . 

CONSENSUS: [LIVMFAC]-K-x(l t 3)-tDEA]-tDE]-tLIVMq-R-Q-[DE].x(4)-Q. 
NAME: Phosphatidylinositol 3- and 4-kinases signature 2. 

CONSENSUS: [GS]-x-[A\n-x(3)-[LIVM]-x(2)-[FYH]-[LIVM](2>-x-[LIVMFI-x-I>-R-H-x(2)-N. 

NAME: Acetate and butyrate kinases family signature 1 . 
CONSENSUS: [LIVM](2)-x-[LIVM]-N-x-G-S-[ST)-S-x-[KE]. 

NAME: Acetate and butyrate kinases family signature 2. 

CONSENSUS: [LIVMA](2)-x(2)-H-x-G-x-G-x-[ST]-[LIVM]-x-[AV]-x(3)-G. 

NAME: Phosphogly cerate kinase signature. 

CONSENSUS: [KRHGTCV]-[VT]-[LIVMF]-[LIVMC]-R-x-D-x-N-[SACV]-P. 
NAME: Aspartokinase signature. 

CONSENSUS: [LIVM]-x-K-[FY]-G-G-[ST]-[SC]-[LIVM]. 
NAME: Glutamate 5-kinase signature. 

CONSENSUS: [GSTN]-x(2)-G-x-G-[GC]-[IM]-x-[STA]-K-[LIVM]-x-[SA]-[TCA]-x(2)-[GALV3- 
CONSENSUS: x(3)-G. 

NAME: ATP:guanido phosphotransferases active site. 
CONSENSUS: C-P-x(0, 1>-[STJ-N-[IL]-G-T. 

NAME: PTS HPR component histidine phosphorylation site signature. 
CONSENSUS: G-[LIVM]-H.[STA]-R.[PA]-[GSTA]-[STAM] . 

NAME: PTS HPR component serine phosphorylation site signature. 

CONSENSUS: [GSADE]-[KREQTV]-x(4V[KRN]-S-[LIVMFl(2)-x-[LIVM]-x(2)-[LIVM]-[GAD]. 

NAME: PTS EIIA domains phosphorylation site signature 1 . 
CONSENSUS: G-x(2)-[UVMF](3)-H-[LIVMF]-G-[LIVMF]-x-T-[ALV]. 

NAME: PTS EIIA domains phosphorylation site signature 2. 

CONSENSUS: [DENQ]-x(6>-[LIVMF}-lGA]-x(2>-[LIVM]-A-[LIVM]-P-H-[GAC]. 

NAME: PTS EUB domains cysteine phosphorylation site signature. 

CONSENSUS: N-[LIVMFY]-x(5)<:-x-T-R-[LIVM^-x-[LIVMF]-x-[LIVM]-x-[DQl. 

NAME: Adenylate kinase signature. 

CONSENSUS: [UVMFYW](3)-D-G-[FYI]-P-R-x(3MNQ]. 

NAME: Nucleoside diphosphate kinases active site. 
CONSENSUS: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE]. 

NAME: Guanylate kinase signature. 

CONSENSUS: T-[ST]-R-x(2)-[KR]-x(2)-[DE]-x(2)-G-x(2)-Y-x-[FY]-[LIVMK]. 
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NAME: Guanylatc kinase domain profile. 

NAME: Phosphoribosyl pyrophosphate synthetase signature. 

CONSENSUS: D-[LI]-H-[SA}-x-Q-[IMSTl-[QM]-G-[FY]-F-x(2>-P-lLIVMFC]-D. 

NAME: 7,8-dihydro-6-hydroxymerhylpterin-pyrophosphokinase signature. 
CONSENSUS: G-[PE]-R-x(2)-D-L-D-[LIVM](2). 

NAME: Bacteriophage-type RNA polymerase family active site signature 1 . 
CONSENSUS: P-[LIVM]-x(2>.D-tGA]-[STl-[AC]-tSN]-[GA]-[LIVMFY]-Q. 

NAME: Bacteriophage-type RNA polymerase family active site signature 2. 
CONSENSUS: [LIVMF]-x-R-x(3)-K-x(2HLIVMF]-M-[PTJ-x(2)-y. 

NAME: Eukaryotic RNA polymerase II heptapepttde repeat. 
CONSENSUS: Y-[STJ-P-[ST]-S-P-[STANK]. 

NAME: RNA polymerases beta chain signature. 

CONSENSUS: G-x-K-[LIVMFA]-[STAC]-[GSTN]-x-[HSTA]-[GS]-(QNH]-K-G-[IVT]. 
NAME: RNA polymerases M / 15 Kd subunits signature. 

CONSENSUS: F-C-x-[DEKST]-C-[GNK]-[DNSAMLIVMH]-[LIVM]-x(8 f 14)-C-x(2)-C. 
NAME: RNA polymerases D / 30 to 40 Kd subunits signature. 

CONSENSUS: N-[SGA]-[UVMF]-R-R-x(9)-[SA]-x(3)-V-x(4)-N-x-[STAJ-x(3)-[DN]-E.x-[LI]- 
CONSENSUS: [GA]-x-R-[Lrj-[GA]-[LIVM](2)-P. 

NAME: RNA polymerases H / 23 Kd subunits signature. 
CONSENSUS: H-[NEI]-tLIVM]-V-P-x-H-x(2)-[UVM]-x(2)-[DE]. 

NAME: RNA polymerases K / 14 to 18 Kd subunits signature. 
CONSENSUS: [ST)-x-[FY]-E-x-[AT].R-x-[LIVM]-[GSA]-x-R-[SA]-x-Q. 

NAME: RNA polymerases L / 13 to 16 Kd subunits signature. 

CONSENSUS: [DE](2)-H-[ST|-lLIVM]-[GAP]-N-x(llVV-x-[FM]-x(2)-Y-x(3)-H-P. 

NAME: RNA polymerases N / 8 Kd subunits signature. 
CONSENSUS: [LIVMF](2).P-[LIVM]-x-C-F-[ST]-C-G. 

NAME: DNA polymerase family A signature. 

CONSENSUS; R-x(2)-[GSAV].K-x(3)-[LIVMFY]-[AGQJ-x(2)-Y-x(2)-[GS]-x(3)-tLIVMA]. 
NAME: DNA polymerase family B signature. 

CONSENSUS: [YA]-[GLIVMSTAC]-D-T-D-[SG]-[LrVMFTC]-x-[LIVMSTAC}. 
NAME: DNA polymerase family X signature. 

CONSENSUS: G-[SG]-[LFY]-x-R-[GE]-x(3>-ISGCL]-x-D-[LIVM]-D-ILIVMFY](3)-x(2)-[SAPl. 

NAME: Galactose- 1 -phosphate uridyl transferase family 1 active site signature. 
CONSENSUS: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q. 

NAME: Galactose- 1 -phosphate uridyl transferase family 2 signature. 
CONSENSUS: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SAJ-H-tDEN]-H-[FY]-Q-G-G. 

NAME: ADP-glucose pyrophosphorylase signature 1 . 

CONSENSUS: [AGJ-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]. 

NAME: ADP-glucose pyrophosphorylase signature 2. 
CONSENSUS: W-[FY]-x-G-[ST]-A-[DNSH)-[AS]-[LIVMFYWl. 

NAME: ADP-glucose pyrophosphorylase signature 3. 

CONSENSUS: [APV]-[GS]-M-G-[UVMNl-Y-[IVC]-[LIVMFY]-x(2)-tDENPHK]. 
NAME: Phosphattdate cytidylyl transferase signature. 

CONSENSUS: S-x-[LIVMF]-K-R-x(4)-K-D-x-[GSA3-x(2)-[LI]-[PG]-x-H.G-G-[LrVM]-x-D-R- 
CONSENSUS: [LIVMFT]-D. 

NAME: Ribonuclease PH signature. 

CONSENSUS: C-PE]-[LIVM](2)-Q-[GTA]-D-G-[SG]-x(2)-[TA]-A. 

NAME: 2' -5 '-oligoadeny I ate synthetases signature 1. 

CONSENSUS: G-G^S-x-[AG]-[KR]-x-T-x-L-[KR]-[GSTI-x-S-D.[AG]. 

NAME: 2 , -5'-oligoadenylate synthetases signature 2. 
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CONSENSUS: R-P-V-l-L-D-P-x-[DE]-P-T. 

NAME: CDP-alcohol phosphatidyl transferases signature. 
CONSENSUS: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D. 

NAME: PEP-utilizing enzymes phosphorylation site signature. 

CONSENSUS: G-tGA]-x-[TN]-x-H-[STA]-[STAVl-[LIVM](2)-[STAV]-[RG]. 

NAME: PEP-utilizing enzymes signature 2. 

CONSENSUS: [DEQS]-x-[LrVMR*S-[LIVMF]-G-[STl-N-D-[LIVM]-x-Q-[LIVMITGT]-[STALIV]- 
CONSENSUS: [LIVMF}-[GAS]-x(2)-R. 

NAME: Rhodanese signature 1. 

CONSENSUS: [FY]-x(3)-H-[LIV]-P-G-A-x(2MLIVF]. 
NAME: Rhodanese C-terminal signature. 

CONSENSUS: [AV]-x(2)-tFY]-[DEAP]-G-tGSA]-[WF]-x-E-[FYW] . 
NAME: CoA transferases signature 1. 

CONSENSUS: [DN]-[GN]-x(2)-[LIVMFA](3)-G-G-F-x(3)-G-x-P. 

NAME: CoA transferases signature 2. 

CONSENSUS: [LF]-[HQ]-S-E-N-G-[LIVF](2)-[GA]. 

NAME: Phospholipasc A2 histidine active site. 
CONSENSUS: C-C-x(2)-H-x(2)-C. 

NAME: Phospholipasc A2 aspartic acid active site. 
CONSENSUS: [LIVMA]-C-{UVMFYWPCST}-C-D-x(5)-C. 

NAME: Lipases, serine active site. 

CONSENSUS: [LIV3-x-(LIVFY]-[LIVMSTI-G-[HYWV]-S-x-G-[GSTAC]. 

NAME: Colipase signature. 
CONSENSUS: Y-x(2)-Y-Y-x-C-x-C. 

NAME: Lipolytic enzymes "G-D-S-L" family, serine active site. 
CONSENSUS: [LIVMFYAG](4)-G-D-S-tLrVM].x(l,2)-[TAG].G. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative histidine active site. 
CONSENSUS: [UVMF](2)-x-[LIVMF]-H-G-G-[SAG3-[FY]-x(3)-[STDN]-x(2)-[ST]-H. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative serine active site. 
CONSENSUS: [LIYM]-x-[LiVMFJ-tSA]-G-D-S-[CA3-G-[GA]-x-L-[CA]. 

NAME: Carboxylesterases Cype-B serine active site. 

CONSENSUS: F-{GR]-G-x(4)-[LIVM]-x-[LrV]-x-G-x-S-[STAG]-G. 

NAME: Carboxylesterases type-B signature 2. 

CONSENSUS: lED]-D-C-L-[YTl-tLIV]-[DNS]-[LIV]-[LIVFYW]-x-[PQR]. 
NAME: Pectinesterasc signature 1 . 

CONSENSUS: [GSTr^<5)-[LIVM)-x-[LIVM]-x(2)^-x-Y-[DNK]-E-x.rLiVM]-x-[LIVM]. 

NAME: Pectinesterase signature 2. 
CONSENSUS: G-[STAD]-[LIVMT]-D-F-I-F-G. 

NAME: Peptidyl-tRNA hydrolase signature 1. 

CONSENSUS: [FY]-x(2>-T-R-H-N.x-G-x(2)-[LIVMFA](2)-[DE]. 

NAME: Peptidyl-tRNA hydrolase signature 2. 

CONSENSUS: [GS]-x(3)-H-N-G-[LIVM]-[KR]-[DNS)-(LiVMT]. 

NAME: Alkaline phosphatase active site. 

CONSENSUS: [TV]-x-I>-S-[GASJ-[GASC]-{GAST|-[GA]-T. 

NAME: Histidine acid phosphatases phosphohistidine signature. 

CONSENSUS: [LIVM]-x(2)-[LIVMA]-x(2)-[LiVM]-x-R-H-[GN]-x-R-x-[PAS]. 

NAME: Histidine acid phosphatases active site signature. 

CONSENSUS: [LFVMF]-x-fLIVMFAGJ-x(2^[STAGn-H-D-[STANQ]-x-[LIVM]-x(2HI-^MFY3.x(2>- 
CONSENSUS: (STA). 

NAME: Class A bacterial acid phosphatases signature. 
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CONSENSUS: G-S-Y-P-S-G-H-T. 
NAME: 5'-nucleotidase signature 1. 

CONSENSUS: [LrVM]-A-[LIVM](2)-(HEA]-(TI]-x-D-x.H-[GSA]-x-[LIVMF]. 

NAME: 5*-nucleotidase signature 2. 

CONSENSUS: [FYP]-x(4)-[UVM]-G-N-H-E-F-[DN] . 

NAME: Fructose- 1-6-bisphosphatase active site. 

CONSENSUS: [AG]-[RK]-L-x(U2)-[LIV]-tFY]-E-x(2)-P-[LIVM]-tGSA]. 

NAME: Serine/threonine specific protein phosphatases signature. 
CONSENSUS: [UVM]-R-G-N-H-E. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 1 . 
CONSENSUS: E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 2. 
CONSENSUS: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D. 

NAME: Protein phosphatase 2C signature. 

CONSENSUS: [UVMFY]-[LIVMFYA]-[GSAC]-[UVM]-[FYC]-D-G-H-[GAV]. 

NAME: Tyrosine specific protein phosphatases active site. 

CONSENSUS: IUVMF)-H-C-x(2)-G-x(3)-[STC]-ISTAGP]-x-[LIVMFY]. 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile. 

NAME: Inositol monophosphatase family signature 1 . 

CONSENSUS: [FWV]-x(0J)-[LIVMl-D-P-[LIVM]-D4SG]-[ST]-x(2>[FY]-x-[HKRNSTY]. 
NAME: Inositol monophosphatase family signature 2. 

CONSENSUS: [WV]-D-x-[AC]-[GSA]-[GSAPV]-x-[LIVACP]-[LIV]-[LIVAC]-x(3)-[GH]-[GA]. 

NAME: Prokaryotic zinc -dependent phospholipase C signature. 
CONSENSUS: H-Y-x-[GT>D-[LIVMHDNSj-x-P-x-H-[PAl-x-N. 

NAME: Phosphatidyl inositol -specific phospholipase X-box domain profile. 

NAME: Phosphatidyl inositol -specific phospholipase Y-box domain profile. 

NAME: 3' 5' -cyclic nucleotide phosphodiesterases signature. 
CONSENSUS: H-D-[LIVMFY]-x-H-x-[AG]-x(2)-[NQ]-x-[LIVMFYl. 

NAME: cAMP phosphodiesterases class-II signature. 

CONSENSUS: H-x-H-L-D-H-[LIVM]-x-[GS]-[UVMA]-[LIVM](2)-x-S-[APl. 
NAME: Sulfatases signature 1. 

CONSENSUS: [SAP]-(LIVMSTI-[CS]-[STAC]-P-[STA]-R-x(2)-[LIVMFW]C2)-[TR]-G. 
NAME: Sulfatases signature 2. 

CONSENSUS: G-[YV]-x-[STI-x(2)-[IVA]-G-K.x(0,l)-[FYWK]-[HL]. 

NAME: AP endonucleases family 1 signature 1 . 
CONSENSUS: [APF]-D-[LIVMF](2)-x-[LIVM]-Q-E-x-K. 

NAME: AP endonucleases family 1 signature 2. 

CONSENSUS: D-[ST]-[FY]-R-[KH]-x(7 ( 8)-lFYW3-[ST]-[FYW](2). 

NAME: AP endonucleases family 1 signature 3. 

CONSENSUS: N-x-G-x-R-[LIVM]-D-[LIVMFYH3-x-[LV]-x-S. 

NAME: AP endonucleases family 2 signature 1. 
CONSENSUS: H*x(2)Of-[UVMF]-[IM]-N-[UVMCAHAG]. 

NAME: AP endonucleases family 2 signature 2. 
CONSENSUS: [GR]-[LIVMFK-[UVM]-D-T-C-H. 

NAME: AP endonucleases family 2 signature 3. 

CONSENSUS: [UVMW]-H-x-N-EDE]-[SA]-K-x(3)-G-[SA]-x(2)-D. 
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NAME: Deoxyribonuclease I signature 1. 

CONSENSUS: [UVM](2)-[AP3-L-H-[STA](2)-P-x(5)-E-[LIVM]-[DN]-a.L-x-[DE3-V. 

NAME: Deoxyribonuclease I signature 2. 
CONSENSUS: G-D-F-N-A-x-C-[SA]. 

NAME: Endonuclease III iron-sulfur binding region signature. 
CONSENSUS: C-x(3)-[KRS]-P-lKRAGL]-C-x(2K:-x(5)-C. 

NAME: Endonuclease III family signature. 

CONSENSUS: [GST]-x-[LrVMF]-P-x{5Hl-IVMW]-x(2 t 3)-[Lr]-[PAS]-G-V.[GA]-x(3)-CGAC]- 
CONSENSUS: x(3)-|LIVM]-x(2)-[SALV]-[LIVMFYW]-[GANK] . 

NAME: Ribonuclease II family signature. 

CONSENSUS: [HIMFYEMGSTAMHUVM]-x(4,5)-Y-[ST^ 

CONSENSUS: [RQ]-[KR]-[FY]-x-D-x(3)-[HQ]. 

NAME: Ribonuclease II! family signature. 

CONSENSUS: [DEQJ-[RQ]-[LM]-E-[FYW]-[LV].G-D-[SAR]. 

NAME: Bacterial Ribonuclease P protein component signature. 

CONSENSUS: [LIVMI^S]-x(2)-A-x(2)-R.[NH)-[KRQL]-[UVM].[KRA]-R-x-tLIVMTA]-[KR]. 

NAME: Ribonuclease T2 family histidine active site 1 . 
CONSENSUS: [FYWL]-x-[LIVM]-H-G-L-W-P. 

NAME: Ribonuclease T2 family histidine active site 2. 

CONSENSUS: [LIVMF]-x(2)-[HDGTy]-[EQ]-[FYW]-x-[KR]-H-G-x-C. 

NAME: Pancreatic ribonuclease family signature. 
CONSENSUS: C-K-x(2)-N-T-F. 

NAME: DNA/RNA non-specific endonuc leases active site. 
CONSENSUS: D-R-G-H-[QIL]-x(3)-A. 

NAME: Thermonuclease family signature 1. 

CONSENSUS: D-G-D-T-[LIVM]-x-[LIVMC]-x(9,10)-R-tLrVM]-x(2)-[LIVM]-D-x-P-E. 

NAME: Thermonuclease family signature 2. 

CONSENSUS: D-[KRl-Y-[GQl-R-x-[LV]-[GA]-x-[IV]-tFYWI. 

NAME: Beta-amylase active site 1. 
CONSENSUS: H-x-C-G-G-N-V-G-D. 

NAME: Beta-amylase active site 2. 

CONSENSUS: G-x-tSAJ-G-E-[LIVM]-R-Y-P-S-Y. 

NAME: Glucoamylase active site region signature. 
CONSENSUS: [STN]-[GP]-x(l ,2)-[DE]-x-W-E-E-x(2)-[GS]. 

NAME: Polygalacturonase active site. 

CONSENSUS: [GSDENKRH]-x(2)-[VMFCJ-x(2)-lGS]-H-G-[LIVMAG]-x(l t 2)-[LIVM]-G-S. 
NAME: Clostridium cellulosome enzymes repeated domain signature. 

CONSENSUS: D-[LIVMFY]-[DNV]-x-[DNS]-x(2)-[LrVM]-[DN]-[SAU4]-x-D-x{3)-[LIVMF]-x- 
CONSENSUS: [RKS]-x-[LIVMF]. 

NAME: Chitinases family 18 active site. 

CONSENSUS: [LIVMFY]-[DN]-G-[LIVMF]-[DN]-[UVMF]-[DN]-x-E. 
NAME: Chitinases family 19 signature 1. 

CONSENSUS: C-x(4,5)-F-Y-[ST]-x(3)-[FY]-[LrvT4H-x-A-x(3)-[YF]-x(2)-F-[GSA]. 
NAME: Chitinases family 19 signature 2. 

CONSENSUS: [IJVMHGSA]-F-x-[STAG](2>[LIVMFY]-W-[W 

NAME: Alpha-lactalbumin / lysozyme C signature. 
CONSENSUS: C-x(3)-C-x(2)-ILMF].x(3)-IDENl-[LI]-x(5)-C. 

NAME: Alpha-galactosidase signature. 

CONSENSUS: G-[LIVMFY]-x(2)-tLIVMFY]-x-[LrVM)-D-D-x-W-x(3,4>-R-(DNSFl. 
NAME: Trehalase signature 1. 
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CONSENSUS: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y. 

NAME: Trchalasc signature 2. 

CONSENSUS: Q-W-D-x-P-x-[GAJ-W-[PA]-P. 

NAME: Alpha-L-fucosidase putative active site. 
CONSENSUS: P-x(2)-L-x(3)-K-W-E-x-C. 

NAME: Glycosy) hydrolases family 1 active site. 

CONSENSUS: [LIVMFSTC]-ILIVFYS1-[LIV]-[LIVMST)-E-N-G-[LIVMFAR]-[CSAGNJ. 
NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQl-x-E-x-[GSTA]. 
NAME: Glycosyl hydrolases family 2 signature 1. 

CONSENSUS: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYW](2)-x(3)-[DN]-x(2)- 
CONSENSUS: G-(LIVMFYW](4). 

NAME: Glycosyl hydrolases family 2 acid/base catalyst. 

CONSENSUS: [DENQFJ-[KRVW]-N-H-[AP3-[SAC]-[LIVMFl(3)-W.[GS]-x(2 t 3)-N-E. 
NAME: Glycosyl hydrolases family 3 active site. 

CONSENSUS: [LIVM](2)-[KR)-x-[EQK]-x(4)^-[LIVMITJ-[UVT]-[LIVMFl-[ST]-D-x(2)- 
CONSENSUS: [SGADNI], 

NAME: Glycosyl hydrolases family 5 signature. 

CONSENSUS: [LIV]-[LIVMFYWGA](2)-[DNEQG]-[LIVMGST]-x-N-E-[PV]-[RHDNSTLIVFY]. 

NAME: Glycosyl hydrolases family 6 signature 1. 

CONSENSUS: V-x-Y-x(2)-P-x-R-D-C-[GSAF]-x(2)-[GSA]<2)-x-G. 

NAME: Glycosyl hydrolases family 6 signature 2. 

CONSENSUS: [LIVMYA]-[LIVA]-[UVTI-EUV]-E-P-D-[SAL]-[LI]-[PSAG]. 
NAME: Glycosyl hydrolases family 8 signature. 

CONSENSUS: A-[STl-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LIVM]-tLIVMG]-x-A-x(3)-[FW]. 

NAME: Glycosyl hydrolases family 9 active sites signature 1. 

CONSENSUS: [STV]-x-[LIVMFir]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVMl-H-x-R. 

NAME: Glycosyl hydrolases family 9 active sites signature 2. 
CONSENSUS: [FYW].x-D-x(4)-[FYW]-x(3)-E-x-ESTA]-x(3)-N-[STA]. 

NAME: Glycosyl hydrolases family 10 active site. 

CONSENSUS: [GTA]-x(2)-[LIVN]-x-[IVMF]-[STJ-E-[LIY].[DN3.[LIVMF]. 

NAME: Glycosyl hydrolases family 1 1 active site signature 1 . 
CONSENSUS: [PSA]-[LQ]-x-E-Y-Y-[LIVM3(2).[DE]-x-[FYWHN]. 

NAME: Glycosyl hydrolases family 1 1 active site signature 2. 

CONSENSUS: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGS]-[SGJ-[STAN]-G-x-[SAF]. 

NAME: Glycosyl hydrolases family 16 active sites. 

CONSENSUS: E-[UVl-D-[LIV]-x(0,l)-E-x(2)-[GQ3-[KRNF]-x-[PSTA]. 
NAME: Glycosyl hydrolases family 17 signature. 

CONSENSUS: [UVM)-x-[LIVMFYWA](3)-[STAG]-E-[STAl-G-W.P-[STN]-x-[SAGQ]. 
NAME: Glycosyl hydrolases family 25 active sites signature. 

CONSENSUS: D-[LIVMl-x(3)-[NCa-[PGl-x(9,10)-G-x(4)-[LIVMFY](2)*K-x-(ST)-E-[GSl-x(2)- 
CONSENSUS: Y-x-[DN]. 

NAME: Glycosyl hydrolases family 31 active site. 
CONSENSUS: [Gfl-[UVMF]-W-x-D-M-[NSA]-E. 

NAME: Glycosyl hydrolases family 31 signature 2. 
CONSENSUS: G-[AV]-D-[LIVMTl^-G-[r^-x(3MST^^ 
CONSENSUS: F-x-P-F-x-R-[DNJ. 

NAME: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H-x(2)-P-x(4MLlVM]-N-D-P-N-G. 

NAME: Glycosyl hydrolases family 35 putative active site. 
CONSENSUS: G-G-P-[UVM](2)-x(2)-Q-x-E-N-E-[FY]. 
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NAME: Gtycosy) hydrolases family 39 active site. 
CONSENSUS: W-x-F-E-x-W-N-E-P-[DN]. 

NAME: Glycosyl hydrolases family 45 active site. 
CONSENSUS: [STA]-T-R-Y-[FYW]-D-x(5)-[CA]. 

NAME: Prokaryotic transglycosylases signature. 
CONSENSUS: [LIVM]-x<3)-E-S-x(3MAP]-x<3)-S-x(5^ 
CONSENSUS: x(4)-[SAG]. 

NAME: Inosine-uridine preferring nucleoside hydrolase family signature. 
CONSENSUS: D-x-D-[PT]-[GA]-x-D-D-rTAV].[VI]-A. 

NAME: Alky I base DNA glycosidases alkA family signature. 

CONSENSUS: G-I<3-x-W-[ST]-[AV]-x-[LIVMFY](2)-x-[LlVM3-x(8)-[MFJ-x(2)-tED]-D. 
NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS: C-x(2,4)-C-x-[GTACa-*-[IV)-x(7)-R-[GSTANJ-[STA]-x-[FYrj-C-x(2>C-Q- 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: [KR]-[LIV]-[LIVC]-[LIVM]-x-G-[QI}-D-P-Y. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 1 . 

CONSENSUS: [CSJ-N-x-[FVL]-S-[ST|-[QAHDEN]-x-[AV3(2)-A-A-[LIV]-[SAV]. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 2. 
CONSENSUS: G-K-x(3)-[UV)-x-G-Y-G-x-V-G-[KR]-G-x-A. 

NAME: Cytosol aminopeptidase signature. 
CONSENSUS: N-T-D-A-E-G-R-L. 

NAME: Aminopeptidase P and proline dipeptidase signature. 

CONSENSUS: [HA]-[GSYR]-[LIVMTl-[SG]-H-x-[LIV]-G-[LIVM]-x-[IV]-H-[DE]. 
NAME: Methionine aminopeptidase subfamily 1 signature. 

CONSENSUS: [Mnn-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[UVMl-x-[HN]-[YWV]. 
NAME: Methionine aminopeptidase subfamily 2 signature. 

CONSENSUS: [DA]-[LIVMY]-x-K-[LIVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN]. 
NAME: Renal dipeptidase active site. 

CONSENSUS: [UVMl-E-G-[GA)-x(2HUVMFJ-x(6)-L-x(3)-Y-x(2)-G-[LIVM]-R. 

NAME: Serine carboxy peptidases, serine active site. 
CONSENSUS: [UVM]-x-[GTA]-E-S-Y-[AG]-[GS]. 

NAME: Serine carboxypeptidases, histidine active site. 

CONSENSUS: [LIVF]-x{2HLIVSTA]-x-[rVPST|-x-IGSDNQL]-[SAGV]-[SG]-H-x-(IVAQ]-P-x(3)- 
CONSENSUS: [PSA]. 

NAME: Zinc carboxypeptidases, zinc-binding region 1 signature. 

CONSENSUS: [PK]-x-tLIVMFY]-x-tLjVMFY]-x(4>-H-ISTAGl-x-E-x-[LIVM]-[STAG)-x(6)- 
CONSENSUS: [LIVMFYTA] . 

NAME: Zinc carboxypeptidases, zinc-binding region 2 signature. 
CONSENSUS: H-[STAG]-x(3)-[UVMEl-x(2)-[LIVMFYW]-P-[FYW]. 

NAME: Serine proteases, trypsin family, histidine active site. 
CONSENSUS: [UVM]-[ST]-A-[STAG]-H-C. 

NAME: Serine proteases, trypsin family, serine active site. 

CONSENSUS: pNSTAGC3-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-lGS]-[SAPHV]-[LIVMFYWH]- 
CONSENSUS: [UVMFYSTANQH] . 

NAME: Serine proteases, subdlase family, aspanic acid active site. 

CONSENSUS: [STAIV]-x-[LrVMF]-[UVM]-D-rPSTA]-G-[LIVMFC]-x(2 t 3)-rPNH]. 

NAME: Serine proteases, subdlase family, histidine active site. 

CONSENSUS: H-G-[STM]-x-[VIC]-[STAGC]-[GSl-x-[UVMA]-[STAGCLV]-tSAGMl. 

NAME: Serine proteases, subdlase family, serine active site. 
CONSENSUS; G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG]. 
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NAME: Serine proteases, V8 family, histidine active site. 

CONSENSUS: [ST]-G-[LIVMFYW](3>-[GN]-x(2)-T-[LIVM]-x-T-x(2>-H. 

NAME: Serine proteases, V8 family, serine active site. 
CONSENSUS: T-x(2)-[GC]-[NQ]-S-G-S-x-[LIVM]-[FY]. 

NAME: Serine proteases, omptin family signature 1 . 
CONSENSUS: W-T-D-x-S-x-H-P-x-T. 

NAME: Serine proteases, omptin family signature 2. 

CONSENSUS: A-G-Y.Q-E^STJ-R-[FYW]-S-[rW]-[TNJ-A-x-G-G-[ST]-Y. 
NAME: Prolyl endopeptidase family serine active site. 

CONSENSUS: D-x(3)-A*x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2). 
NAME: Endopeptidase Clp serine active site. 

CONSENSUS: T-x(2).[LIVMF]-G-x-A-[SAC]-S-[MSA]-[PAG].[STA]. 

NAME: Endopeptidase Clp histidine active site. 

CONSENSUS: R-x(3)-[EAP]-x(3>-[LIVMFYTl-M-tLIVM]-H-Q-P. 

NAME: ATP-dependent serine proteases. Ion family, serine active site. 
CONSENSUS: D-G-[PD]-S-A-[GS]-[LIVMCA)-ETA]-[LIVM]. 

NAME: Eukaryotic thiol (cysteine) proteases cysteine active site. 
CONSENSUS: Q-x(3MGE]-x-C-[YW]-x(2MSTAGCMSTAGCV]. 

NAME: Eukaryotic thiol (cysteine) proteases histidine active site. 

CONSENSUS: [LIVMGSTAN]-x-H-[GSACE].[UVM]-x-[LIVMATl(2).G-x-[GSADNH]. 

NAME: Eukaryotic thiol (cysteine) proteases asparagine active site. 
CONSENSUS: [F¥rH]-[Wn-[LIVT]-x-rKRQAGh^^ 
CONSENSUS: [UVMFYG]-x-[LIVMFl. 

NAME: Ubiquitin carboxy] -terminal hydrolase family 1 cysteine active-site. 
CONSENSUS: Q-x(3)-N-[SA]-C-G-x(3)-[LrVM](2)-H-[SA]-[UVM]-[SA]. 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature I. 
CONSENSUS: G4UVMrTMl,3MAGCMNASM]-x-C-[FYW]-[L^ 
CONSENSUS: Q. 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 
CONSENSUS: Y-x-L-x-[SAG]-[LIVMFT|-x(2)-H-x-G-x(4,5)-G-H-Y. 

NAME: Caspase family histidine active site. 

CONSENSUS: H-x(2,4)-[SC]-x(4)-lLIVMF](2)-[ST]-H-G. 

NAME: Caspase family cysteine active site. 
CONSENSUS: K-P-K-[LIVMF](4)-Q-A-C-[RQG)-G. 

NAME: Eukaryotic and viral aspartyl proteases active site. 

CONSENSUS: [L^MFGAC]-[UVMTADN]-[IJVFSA]-D-[ST]-G-tSTAVJ-[STAPDENQ]-x-[LIVMFSTNC3- 
CONSENSUS: x-[LIVMFGTA]. 

NAME: Neutral zinc metal lopeptidases, zinc-binding region signature. 

CONSENSUS: [GSTALIVN]-x(2)-H-E-[LrVMFYWl-{DEHRKP}-H-x-[LI VMFYWGSPQ] . 

NAME: Matrixins cysteine switch. 

CONSENSUS: P-R-C-[GN]-x-P-[DR]-[LIVSAPKQ). 

NAME: Insulinase family, zinc-binding region signature. 
CONSENSUS: G-x(8,9K>-x-[STAl-H-[UVMFY]-[UVMq^^ 
CONSENSUS: [GSTAN]-[GST|, 

// 

AC PS01016; 

DE Glycoprotease family signature. 

CONSENSUS: [FOl]-lGSAT3-x(4)-[FYWHL]-[DQNGK]-x-P-x-[LIVMFY]-x(3)-H-x(2)-[AG]-H- 
CONSENSUS: [LIVMJ. 

NAME: Proteasome A-rype subunits signature. 
CONSENSUS: [Fn-x(4)-[STrW]-x-[FYW]-S-P-x-G-[^^ 
CONSENSUS: [SAG]. 
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NAME: Proteasome B-type subunits signature. 

CONSENSUS: [LIVMA]-[GSA]-[LIVMF)-x-rFYLVGAC]-x(2)-[GSACnn-[LIVMSTAC](3)-[GACJ- 
CONSENSUS: [GSTACV]-[I>ES]-x(15)-[RK]-x(l2,13)-G-x(2)-[GSTA].D. 

NAME: Signal peptidases I serine active site. 
CONSENSUS: [GS]-x-S-M-x-[PS]-[ATHLF]. 

NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-[LIVMSTA](2)-G-x-n > G]-G-[DE]-x.[LIVM].x-[LIVMFY]. 
NAME: Signal peptidases I signature 3. 

CONSENSUS: [LIVMFYWl(2Vx(2)-G-D-[NH]-x(3)-(SNDJ-x(2).[SG]. 
NAME: Signal peptidases II signature. 

CONSENSUS: [GAFJ-IGA]-[GAS]-[LIVM].[GAS]-N-[LVMFG]-[LIVMFY]-D-R-[LIMFA]. 
NAME: Peptidase family U32 signature. 

CONSENSUS: E-x-F-x(2)-G-[SA]-[LIVM]-C-x(4)-G-x-C-x.[LIVM]-S. 
NAME: Amidases signature. 

CONSENSUS: G-tGA]-S-S-[GS]-G-x-[GSA]-[GSAVY]-x-[LrVM]-[GSA]-x(6)-[GSA]-x-[GA]-x-D- 
CONSENSUS: x-[GA)-x-S-[LIVM]-R-x-P-[GSAC]. 

NAME: Asparaginase / glutaminase active site signature 1 . 
CONSENSUS: [LI VM]-x(2>-T-G-G*T-[IV]-[ AGS] . 

NAME: Asparaginase / glutaminase active site signature 2. 
CONSENSUS: G-x-[LIVM]-x(2)-H-G-T-D-T-[LIVM] . 

NAME: Urease nickel ligands signature. 

CONSENSUS: T-lAY]-[GA]-[GATJ-[LIVM]-D-x-H-tLrVM]-H-x(3)-P. 
NAME: Urease active site. 

CONSENSUS: [LIVM3(2)-[Cn-H-[HN3-L-x(3)-tLIVM]-x(2)-I)-ELIVM]-x-F-A. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 1 . 
CONSENSUS: [LIV]-[GAUvIY]-[LIVMFI-x-[GSA]-H-x-D-fTV]-[STAV]. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 2. 

CONSENSUS: [GSTAI]-tSANQ]-D-x-K-[GSACN]-x(2)-[LIVMA]-x(2HLIVMFY]-x(14,17).[LIVM]- 
CONSENSUS: x-[LIYMF]-(LIVMSTAG].[UVMFA]-x(2)-rPNG]-E-E-x-[GSTN]. 

NAME: Dihydroorotase signature 1. 

CONSENSUS: D-[LIVMFYWSAP]-H-[LIVA]-H-tLIVF].[RN]-x-[PGN]. 

NAME: Dihydroorotase signature 2. 
CONSENSUS: [GA]-[ST]-D-x-A-P-H-x(4)-K. 

NAME: Beta-lactamase class-A active site. 

CONSENSUS: [F^-x-[LIVMr^-x-S-[TV]-x-K-x(4)-[AGLM]-x{2)-[LC]. 

NAME: Beta-lactamase class-C active site. 
CONSENSUS: F-E-[LIVM]-G-S-[LIVMG]-[SA]-K. 

NAME: Beta-lactamase class-D active site. 

CONSENSUS: [PA]-x-S-[STJ-F-K-[LIV]-[PAL]-x-[STA]-[LI]. 

NAME: Beta-lactamases class B signature 1. 

CONSENSUS: [LI]-x-[STN]-[HN]-x-H-[GSTA}-D-x(2)-G-[GP]-x(7,8>-[GS]. 

NAME: Beta-lactamases class B signature 2. 

CONSENSUS: P-x(3).[LrVM](2)-x-G-x-C-[LIVMF](2)-K. 

NAME: Arginase family signature 1 . 

CONSENSUS: [LIVMF]-G-G-x-H-x-[LrVMTJ-tSTAV]-x-[PAG]-x(3)-[GSTA]. 

NAME: Arginase family signature 2. 

CONSENSUS: [UVM](2)-x-[LIVMFY]-D-[AS]-H-x-D. 

NAME: Arginase family signature 3. 

CONSENSUS: [STl-lLIVMF^-D.[LIVM3-D-x(3)-rPAQ]-x(3)-P-[GSA]-x(7)-G. 
NAME: Adenosine and AMP deaminase signature. 
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CONSENSUS: [SA]-[UVM]-[NGS]-[STA]-D-D-P. 

NAME: Cytidinc and deoxycy tidy late deaminases zinc-binding region signature. 

CONSENSUS: [CH]-[AGV]-E-x(2^[LlVMFGAT]-[LIVM]-x(17,33)-P-C-x(2,8)-C-x(3)-[LrVM]. 

NAME: GTP cyclohydrolase I signature 1. 

CONSENSUS: IENl-tLIVM](2)-x(2)-[KRQN]-[DN]-[LIVMl-x(3).[ST]-x-C-E-H-H. 
NAME: GTP cyclohydrolase I signature 2. 

CONSENSUS: [SA]-x-[RK]-x-Q-[LIVM]-Q-E-[RN)-[LI]-[TSN]. 

NAME: Nitrilases / cyanide hydrolase signature 1 . 

CONSENSUS: G-x(2)-(LIVMFY](2)-x-[IF]-x-E-x(2)-[LIVM]-x.G-Y.P. 

NAME: Nitrilases / cyanide hydrolase active site signature. 

CONSENSUS: G-[GAQ]-x(2)-C-[WA]-E-[NH]-x(2).[PST|-[LIVMFYS].x-[KR]. 

NAME: Inorganic pyrophosphatase signature. 

CONSENSUS: D-[SGDN]-D-[PE]-[LIVMF]-D-tLIVMGAC]. 

NAME: Acy (phosphatase signature 1. 
CONSENSUS: [LIV]-x-G-x-V-Q-G-V-x-[FM]-R. 

NAME: Acy [phosphatase signature 2. 

CONSENSUS: G-[FYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G. 

NAME: ATP synthase alpha and beta subunits signature. 
CONSENSUS: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature. 
CONSENSUS: [IV]-T-x-E-x(2)-[DEl-x(3)-G-A-x-[SAKR]. 

NAME: ATP synthase delta (OSCP) subunit signature. 

CONSENSUS: [LIVMhx-[LIVMFYT].x(3)-[LIVMT]-[DENQK]-x(2HLIVM]-x-[GSA3-G-[LIVMFYGA]- 
CONSENSUS: x-[LIVM]-[KRHENQJ-x-[GSEN] . 

NAME: ATP synthase a subunit signature. 

CONSENSUS: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV]-N-[LIVMT]. 
NAME: ATP synthase c subunit signature. 

CONSENSUS: [GSTA]-R-[NQ]-P-x(10)-[LIVMFyW](2)-x(3)-[LIVMFYW]-x-[DE]. 

NAME: E1-E2 ATPases phosphorylation site. 
CONSENSUS: D-K-T-G-T-[LI]-ITI]. 

NAME: Sodium and potassium ATPases beta subunits signature 1. 

CONSENSUS: [FYW]-x(2)-tFYW]-x-[FYW].[DN]-x(6^[LIVMl-G-R-T-x(3)-W. 

NAME: Sodium and potassium ATPases beta subunits signature 2. 
CONSENSUS: [RKJ-x(2)-C-[RKQWI]-x(5)-L-x(2)-C-[SA]-G. 

NAME: GDA1/CD39 family of nucleoside phosphatases signature. 

CONSENSUS: [LIVM]-x-G-x(2)-E-G-x-[FY]-x-EFW]-[LIVA]-rrAG]-x-N-[HY]. 

NAME: Iodothyronine deiodinases active site. 
CONSENSUS: R-P-L-V-x-N-F-G-S-[CA]-T-C-P-x-F. 

NAME: Cutinase, serine active site. 

CONSENSUS: P-x-[STA]-x-[LIV]-tIvTJ-x-[GS]-G-Y-S-[QLl-G. 

NAME: Cutinase, aspartate and histidine active sites. 

CONSENSUS: C-x(3)-D-x-[IVl<:-x-G-[GSTl-x(2)-[LIVM]-x(2,3)-H. 

NAME: DDC / GAD / HDC / TyrDC pyridoxal -phosphate attachment site, 

CONSENSUS: S-[LIVMFYW]-x<5^K-[LIVMIWG3(2)-x(3^[LIVMFYW]-x-[CA]-x(2)-[LIVMFYWQ]- 
CONSENSUS: x(2)-[RK]. 

NAME: Om/Lys/Arg decarboxylases family 1 pyridoxal -P attachment site. 
CONSENSUS: [STAVJ-x-S-x-H-K-x(2)-[GSTAN](2)-x-[STA]-Q-ESTA](2). 

NAME: Orn/DAP/Arg decarboxylases family 2 pyridoxal-P attachment site. 
CONSENSUS: [FYMPA]-x-MSACVHNHCLFW]-x(4)-[L^^ 
CONSENSUS: [GTE]. 
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NAME: Orn/DAP/Arg decarboxylases family 2 signature 2. 

CONSENSUS: [GS]-x(2,6)-[LIVMSCP]-x(2)-[LIVMF]-[DNS]-[LIVMCA]-G-G-G-[LIVMFY]- 
CONSENSUS : [GSTPCEQJ . 

NAME: Orotidirte 5' -phosphate decarboxylase active site. 

CONSENSUS: [UVMFTAl-[LIVMF].x-D-x-K-x(2)-D-I-[GP]-x-T.[LIVMTA]. 

NAME: Phosphoenolpyruvate carboxylase active site 1. 
CONSENSUS: [VT]-x-T-A-H-P-T-[EQ)-x(2)-R-[KRH]. 

NAME: Phosphoenolpyruvate carboxylase active site 2. 
CONSENSUS: [IV]-M-[LIVM]-G-Y-S-D-S-x-K-D-[STAG]-G. 

NAME: Phosphoenolpyruvate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N. 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature. 
CONSENSUS: L-I-G-D-D-E-H-x-W-x-[DE]-x-G-[IV]-x-N. 

NAME: Uroporphyrinogen decarboxylase signature 1 . 
CONSENSUS: P-x-W-x-M-R-Q-A-G-R. 

NAME: Uroporphyrinogen decarboxylase signature 2. 

CONSENSUS: G-F-[STAGCV]^STACK;j-x-P-[FYW].T-[LV]-x(2)-Y-x(2)-[AE]-[GK]. 

NAME: IndoIe-3-glycerol phosphate synthase signature. 
CONSENSUS: [LIVMFY]-[LrYMC>x-E-[LIVMF^ 

NAME: Ribulose bisphosphate carboxylase large chain active site. 
CONSENSUS: G-x-[DN]-F-x-K-x-D-E. 

NAME: Fructose-bisphosphate aldolase class-I active site. 
CONSENSUS: [UVM]-x-[LIVMFYW]-E-G-x-[LSl-L-K-P-[SN]. 

NAME: Fmctose-bisphosphate aldolase class- 1 1 signature 1 . 

CONSENSUS: [FYVM]-x(l ( 3>[UVMH]-[APN]-[LIVM]-x(l,2^rL^VM]-H-x-D-H-[GACH]. 

NAME: Fructose-bisphosphate aldolase class-II signature 2. 
CONSENSUS: [LIVM]-E-x-E-tLIVMl-G-x{2)-[GM}-[GSTA]-x-E. 

NAME: Malate synthase signature. 

CONSENSUS: [KRMDENQ]-H-x(2)-G-l^N-x-G-x-W-D-Y-[LIVM]-F. 

NAME: Hydroxymethylglutaryl-coenzyme A lyase active site. 
CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 

NAME: Hydroxymethylglutaryl-coenzyme A synthase active site. 
CONSENSUS: N-x-rDN]-[IV]-E-G-irV]-D-xa)-N-A-C-tFY]-x-G. 

NAME: Citrate synthase signature. 

CONSENSUS: G-[FYA]-[GA}-H-x-[IVJ-x(l,2)-[RKT]-x(2)-D-[PS]-R. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 1. 
CONSENSUS: L-R-[DE]-G-x-Q-x( 10)-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 2. 
CONSENSUS: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLn. 

NAME: KDPG and KHG aldolases active site. 
CONSENSUS: G-[LIVM]-x<3>-E-[LIV]-T-[LF]-R. 

NAME: KDPG and KHG aldolases Scruff-base forming residue. 
CONSENSUS: G-x(3>-[UVM^-MLF]-F-P-[SA]-x<3)-G. 

NAME: Isocitrate lyase signature. 
CONSENSUS: K-[KR]-C-G-H-[LMQ]. 

NAME: Beta-eliminating lyases pyridoxal-phosphate attachment site. 
CONSENSUS: Y-x-D-x(3)-M-S-[GA]-K-K-D-x-[LIVM](2>-x-(LIVM]-G-G. 

NAME: DNA photolyases class 1 signature 1. 

CONSENSUS: T-G-x-P-[LrA^(2)-D-A.x-M-tRA]-x-[UVKf]. 

NAME: DNA photolyases class 1 signature 2. 
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CONSENSUS: [Dhn-R-x-R-[l^M](2)-x4STA](2)-F-[LrVMFAl-x-K-vL-x(2,3)-W.[lCRQ]. 

NAME: DNA photolyases class 2 signature 1 . 
CONSENSUS: F-x-E-E-x-[UVM](2)-R-R-E-L-x(2)-N-F. 

NAME: DNA photolyases class 2 signature 2. 

CONSENSUS: G-x-H-D-x(2)-W.x-E-R-x-[LIVM]-F-G-K-[UVM]-R-(FY]-M-N. 
NAME: Eukary otic-type carbonic anhydrases signature. 

CONSENSUS: S.E-H-x.[LIVM]-x(4V[FYH]-x(2)-E-[LIVM]-H-[LIVMFA](2). 

NAME: Prokaryotic-type carbonic anhydrases signature 1. 
CONSENSUS: C-[SA]-D-S-R-[LIVM]-x-{AP], 

NAME: Prokaryotic-type carbonic anhydrases signature 2. 

CONSENSUS: [EQ]-Y-A-[LIVM]-x(2)-[LIVM]-x(4)-tLrVMF](3)-x-G-H-x(2)-C-G. 

NAME: Fumarate lyases signature. 
CONSENSUS: G-S-x(2)-M-x(2)-K-x-N. 

NAME: Aconitase family signature 1. 

CONSENSUS: [LIVM]-x(2)-[GSACIVM]*x-[LIV]-[GTIV]-[STP]-C-x(0.1)-T-N-tGSTANr]-x(4)- 
CONSENSUS: [UVMAJ. 

NAME: Aconitase family signature 2. 

CONSENSUS: G-x(2)-[LIVWPQJ-x(3)-[GAC]-C-tGSTAM]-[LlMPTA]-C-[LIMV]-[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 1 . 
CONSENSUS: C-D-K-x(2)-P-[GA]-x(3MGA] . 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 2. 
CONSENSUS: [SA]-L-[UVM]-T-D-[GA]-R-[LTVMF]-S-[GA]-[GAV]-[ST3. 

NAME: Dehydroquinase class I active site. 

CONSENSUS: D-(LIVM]-[DE]-[LIVN]-x(18,20HLIVM3(2)-x-[SCl-[NHY]-H-[DNl. 
NAME: Dehydroquinase class II signature. 

CONSENSUS: [LTVM]-tNQ]-G-P-N-[LV]-x(2)-L-G-x-R-[QED]-P-x(2)-[FY3-G. 
NAME: Enolase signature. 

CONSENSUS: [LIV](3)-K-x-N-Q-I-G-[ST|-[LIV]-[ST]-[DE]-[STA]. 

NAME: Serine/threonine dehydratases pyridoxal-phosphate attachment site. 

CONSENSUS: [DESH]-x(4,5)-[STVG]-x-[AS]-[FYI]-K-(DLIFSA]-[RVMF]-[GA]-[UVMGA]. 

NAME: Enoyl-CoA hyd rata se/i some rase signature. 

CONSENSUS: [UVMHSTAJ-x-[LIVM]-PENQRHSTA]-G-x(3H^ 

CONSENSUS: [DQHPJ-[UVMFY]. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 1. 
CONSENSUS: [UVMY)-[DE]-x-H-H-x(2^E-x(2)-[rc^ 

NAME: Imidazoleglycerol-phosphate dehydratase signature 2. 
CONSENSUS: G-x-[DN]-x-H-H-x(2)-E-[STAGC)-x-[FY]-K. 

NAME: Tryptophan synthase alpha chain signature. 
CONSENSUS: [IJVM]-E-[LIVMhG-x(2)-[FYC)-[ST]-^ 

NAME: Tryptophan synthase beta chain pyridoxal-phosphate attachment site. 
CONSENSUS: [LTVM]-x-H-x-G-[STA]-H-K-x-N. 

NAME: Delta-aminolevulinic acid dehydratase active site. 
CONSENSUS: G-x-D-x-[LrVM]<2>[IV]-K-P-[GSA]-x<2)-Y. 

NAME: Urocanase active site. 
CONSENSUS: F-Q-G-L-P-x-R-I-C-W. 

NAME: Prephenate dehydratase signature 1. 

CONSENSUS: [F^-x-[LIVM3-x(2)-[UVM]-x(5)-rDN]-x(5)-T-R-F-[LIVMWl-x-ILrVM]. 

NAME: Prephenate dehydratase signature 2. 
CONSENSUS: [UVMMSTJ-[KR]-[LIVM]-E-[ST]-R-P. 

NAME: Dihydrodipicolinate synthetase signature 1. 
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CONSENSUS: [GSA3-[LWM]-(UVMFY]-x(2)^-tST]4TG]-G-E-[GASNF]-x(6)-[EQ]. 
NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS: Y-[DNS]-fLIVMF]-P-x(2)-[ST|.x(3)-[LrVM]-x(l3,14)-[UVM]-x-[SGA]-[LIVMF)- 
CONSENSUS: K-[DEQAF]-[STAC]. 

NAME: RsuA family of pseudourtdine synthase signature. 
CONSENSUS: G-R-L-D-x(2)-[ST]-x-G-[LI VMF](4)-[ST].[DNT] . 

NAME: Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site. 
CONSENSUS: K-x-E-x(3)-[PA]-[STAGC]-x-S-[IVAP]-K-x-R-x-[STAG]-x(2)-[LIVM]. 

NAME: Phenylalanine and histidine ammonia-lyases signature. 

CONSENSUS: G-[STG]-[UVM3-[STG}-[AC]-S-G-[DH]-L-x-P-L-[SA]-x(2)-[SA]. 

NAME: Porphobilinogen deaminase cofactor-binding site. 

CONSENSUS: E-R-x-[LIVMFA]-x(3)-[LI\^F]-x-G-[GSA]-C-x-(IVTl-P-[LiVMFl-[GSA]. 
NAME: Cys/Met metabolism enzymes pyridoxal-phosphate attachment site. 

CONSENSUS: IDQJ-[LIVMF]-x(3)-[STAGC]-[STAGCI]-T-K-(FYWQ]-[LIVMFl-x-G-[HQ]-tSGNH]. 
NAME: Glyoxalase I signature I . 

CONSENSUS: [HQ]-[IVTl-x-[LIVFY]-x4IV]-x(5)-[STA]-xa)-F-[YM]-x(2,3)-[LMF]-G-[LMF3. 
NAME: Glyoxalase I signature 2. 

CONSENSUS: G-[NTKQ]-x(0,5)-[GA]-[LVFV r ]-lGH]-H-[IVFl-[CGA]-x-[STAGL]-x{2)-PNC]. 

NAME: Cytochrome c and cl heme lyases signature 1. 
CONSENSUS: H-N-x(2)-N-E-x(2)-W-[NQKR]-x(4)-W-E. 

NAME: Cytochrome c and cl heme lyases signature 2. 
CONSENSUS: P-F-D-R-H-D-W. 

NAME: Adenylate cyclases class-1 signature 1 . 
CONSENSUS: E-Y-F-G-[SA)(2)-L-W-x-L-Y-K. 

NAME: Adenylate cyclases class-I signature 2. 

CONSENSUS: Y-R-N-x-W-[NS]-E-[LIVM]-R-T-L-H-F-x-G. 

NAME: Guanylate cyclases signature. 

CONSENSUS: G-V-[LIVM]-x(0^G-x(5MF^-x-[LW^ 
CONSENSUS: [DNTA]-x(5)-[DE]. 

NAME: Chorismate synthase signature 1. 

CONSENSUS: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-[DE]-G-x-[PV]. 
NAME: Chorismate synthase signature 2. 

CONSENSUS: [GE]-R-[SA](2)-[SAG]-R-tEV]-[STl-x(2)-tRH]-V-x(2)-G. 
NAME: Chorismate synthase signature 3. 

CONSENSUS: R-[SH]-D^PSV]-[CSAV]-x(4)-[GAI]-x-[rVGSP]-[LIVM]-x-E-[STAH]-ILIVM]. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 1. 
CONSENSUS: C-N-N-x(2)-G-H-G-H-N-Y. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 2. 
CONSENSUS: D-H-K-N-L-D-x-D. 

NAME: Ferrochelatase signature. 

CONSENSUS: [UVMFl(2)-x-S-x-H-[GS]-[LIVM]-P-x(4,5)-[DENQKRl-x-G-D-x-Y. 

NAME: Alanine racemase pyridoxal-phosphate attachment site. 
CONSENSUS: V-x-K-A-[DN]-[GA]-Y-G-H-G. 

NAME: Aspartate and glutamate racemases signature 1 . 

CONSENSUS: [IVAMLIVMJ-x-C-xCO. l)-N-[ST|-[MSAHSTrq-[UVFYSTANK]. 
NAME: Aspartate and glutamate racemases signature 2. 

CONSENSUS: [UVMia)-x-[AG]-C-T-[DEH]-[UVMFYJ-[PNGRS]-x-[LIVM]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 1 . 
CONSENSUS: A-x-[SAG](2)-[LIVM]-EDE]-x-A-x(2)-D-x(2)-[GA]-[KR]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 2. 
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CONSENSUS: G-xa)-I>-x(9)-A-x(14)-[LIVM]-E-[DENQI-P-x(4)-[DENQ]. 
NAME: Ribulose-phosphate 3-epimerase family signature 1. 

CONSENSUS: [UVMF]-H-[LIVMFY]-D-[LIVM]-x-D-x( 1 ,2)-[FY]-[LIVM]-x-N-x-[STA V] . 
NAME: Ribulose-phosphate 3-epimerase family signature 2. 

CONSENSUS: [UVMA]-x-tLIN^]-M-[STl-[VS]-x-P.x(3Ki-Q-x-F-x(6).[NK]-[LIVMC]. 

NAME: Aldose 1-epimerase putative active site. 
CONSENSUS: [NS]-x-T-N-H-x-Y-[FW]-N-[LI] . 

NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: [FY]-x(2HSTCNLV]-x-F-H-[RH]-[LIVMN]-[UVM]-x(2)-F-[UVMJ-x-Q-[AG}-G. 
NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase profile. 
NAME: FKBP-rype peptidyl-prolyl cis-trans isomerase signature 1 . 

CONSENSUS: [LIVMC]-x-[YF]-x-[GVL]-x(l t 2)-[LFT]-x(2>-G-x(3HDE]-(STAEQK3-[STAN]. 

NAME: FKBP-rype peptidyl-prolyl cis-trans isomerase signature 2. 
CONSENSUS: [LJ\MFY}-x(2)-[GA]-x(3,4)-[LIVMF]-xa^ 
CONSENSUS: x(3)-[PSGAQj-x(2)-[AG]-[FY]-G. 

NAME: FKBP-rype peptidyl-prolyl cis-trans isomerase domain profile. 

NAME: PpiC-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: F-[GSADEI]-x-[LVAQ]-A-x(3)-[STl-x(3,4)-[STQ]-x(3 ( 5)-[GER]-G-x-ELIVM]- 
CONSENSUS: [GS]. 

NAME: Triosephosphate isomerase active site. 
CONSENSUS: [AV]-Y-E-P-[UVM]-W-[SA]-I-G-T-[GK1. 

NAME: Xylose isomerase signature 1. 
CONSENSUS: lU]-E-P-K-P-x(2)-P. 

NAME: Xylose isomerase signature 2. 

CONSENSUS: [FL]-H-D-x-D-[UV]-x-[PD]-x-[GDE]. 

NAME: Phosphomannose isomerase type I signature 1 . 
CONSENSUS: Y-x-D-x-N-H-K-P-E. 

NAME: Phosphomannose isomerase type I signature 2. 

CONSENSUS: H-A-Y-[LIVM]-x-G-x(2)-[LIVM]-E-x-M-A-x-S-D-N-x-[LIVMl-R-A-G-x-T-P-K. 
NAME: Phosphoglucose isomerase signature 1. 

CONSENSUS: [DENS]-x-[LIVM]-G-G-R-[FY]-S-[LIVMTJ-x-[STA}-lPSAC]-[LIVMA]-G. 
NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: [GS]-x-tLIVM]-[LlVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x(2)-K. 

NAME: Glucosamine/galactosamine-6-phosphate isomerases signature. 

CONSENSUS: [LIVMl-x(3)-G-x-[LIT]-x-[UV]-x-[LIVM3-x-G-[LIVM]-G-x-[DEN]-G-H. 

NAME: Phosphogly cerate mutase family phosphohistidine signature. 
CONSENSUS: [LIVM]-x-R-H-G-[EQ]-x(3)-N . 

NAME: Phosphoglucomutase and phosphomannomutase phosphoserine signature. 
CONSENSUS: [GSA]-[LIVM]-x-[LIVM)-[ST]-[PGA]-S-H-x-P-x(4>-[GNHE]. 

NAME: Methylmalonyl-CoA mutase signature. 

CONSENSUS: R-I-A-R-N-[TQ]-x(2)-[UVMFY](2)-x-tEQ]-E-x(4)-[KRN]-x(2)-D-P-x-[GSA]- 
CONSENSUS: G-S. 

NAME: Terpene synthases signature. 

CONSENSUS: [DE]-G-S-W-x-G-x-W-[GA]-[UVM]-x-(FY]-x-Y-[GA]. 

NAME: Eukaryotk DNA topoisomerase I active site. 

CONSENSUS: PEN]-x(6)-[GS}-[IT|-S-K-x(2)-Y-[LIVM]-x(3)-[LIVM3. 

NAME: Prokaryonc DNA topoisomerase I active site. 

CONSENSUS: [EQ]-x-L-Y-[DEQT]-x(3,12)-[LI]-[ST|-Y-x-R-[ST]-[DEQS]. 

NAME: DNA topoisomerase II signature. 
CONSENSUS: [LIVMA]-x-E-G-[DN]-S-A-x-[STAG]. 
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NAME: Aminoacyl-transfer RNA synthetases class-I signature. 

CONSENSUS: P-x(0,2)-tGSTAN]-lDENQGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G-[HNTG]- 
CONSENSUS: [LIVMFYSTAGPCJ. 

NAME: Aminoacyl-transfer RNA synthetases class-H signature 1 . 
CONSENSUS: [FYH3-R-x-[DE]-x(4,12)-tRH]-x(3)-F.x(3)-[DE]. 

NAME: Aminoacyl-transfer RNA synthetases class-II signature 2. 
CONSENSUS: [GSTALVH-{DENQHRKPMGSTA]-[LrV^ 

N AME : WHEP-TRS domain signature . 

CONSENSUS: [QY]-G-pNEA]-x-(LIV]-rKR]-x(2>-K-x(2)-[KRNG]-[AS]-x(4)-[LIVl-pENK]- 
CONSENSUS: x(2)-[IV]-x(2)-L-x(3>-K. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 1 . 

CONSENSUS: S-fKR]-S-G-[GT]-[LrVM]-[GSTl-x-[EQ]-x(8 f 10)-G-x(4)-[LIVM]-tGA].[LIVM]-G- 
CONSENSUS: G-D. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active site. 
CONSENSUS: G*x(2)-A-x(4 t 7)-[RQTl-[LIVMF]-G-H-[AS]-[GH]. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 3. 

CONSENSUS: G-x-[IV]-x(2)-[LIVMF]-x-[NA]-G-[GA]-G-[LA3-[STAV]-x(4).D-x-[LrVM]-x(3)- 
CONSENSUS: G-(GRE]. 

NAME: Glutamine synthetase signature 1 . 

CONSENSUS: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-tDEl-x(2)-[LIVMFY]. 

NAME: Glutamine synthetase putative ATP-binding region signature. 
CONSENSUS: K-P-[LIVMFYA]-x(3,5)-[NPAT3-G-[GSTANl-G-x-H-x(3)-S. 

NAME: Glutamine synthetase class-I adenylation site. 
CONSENSUS: K-[LIVM]-x(5>-[LIVMA]-D-[RK]-rPN]-[LI]-y. 

NAME: D-alanme-D-alanine ligase signature 1 . 

CONSENSUS: H-G-x(2)-G-E-D-G-x-tLrVMA]-[QSA]-[GSA]. 

NAME: D-alanine-D-alanine ligase signature 2. 

CONSENSUS: rUV]-x(3)-[GA]-x-[GSAIV]-R-[LIVCA]-D-[LIVMF](2)-x(7 T 9)-[LI]-x-E- 
CONSENSUS: [LIVA]-N-ISTP]-x-P-[GA] . 

NAME: SAICAR synthetase signature 1. 

CONSENSUS: [LIVMF3(2>-P-[LIVM]-E-x-[LIVM]-[LIVMCA]-R-x(3)-[TA]-G-S. 

NAME: SAICAR synthetase signature 2. 

CONSENSUS: rLIVM]-[LIVMA]-D-x-K-[LrVMFY]-E-F-G. 

NAME: Fotylpolyglutamate synthase signature 1. 

CONSENSUS: [LIVMFY]-x-tLIVM]-[STAG]-G-T-[NK]-G-K-X'[ST|-x(7)-[LtVMl(2)-x(3)-[GSK]. 
NAME: Folylpolyglutamate synthase signature 2. 

CONSENSUS: [LIVMFY](2)-E-x-G-[LrVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2). 

NAME: Ubiquitin-activating enzyme signature 1. 
CONSENSUS: K-A-C-S-G-K-F-x-P. 

NAME: Ubiquitin-activating enzyme active site. 
CONSENSUS: P-[LIVM]-C-T-[LIVM]-tKRH]-x-[FT|-P. 

NAME: Ubiquitin-conjugating enzymes active site. 

CONSENSUS: fnnmPj-H-IPCJ-tNHl-lLIVl-xCS^^G-x-ILIVl-C^UVl-x-fLIV]. 

NAME: Formate-tetrahydrofolate ligase signature 1 . 
CONSENSUS: G-[LIVM]-K-G-G-A-A-G-G-G-Y. 

NAME: Formate-tetrahydrofolate ligase signature 2. 
CONSENSUS: V-A-T-[IVJ-R-A-L-K-x-[HN]-G-G. 

NAME: Adenylosuccinate synthetase GTP-binding site. 
CONSENSUS: Q-W-G-D-E-G-K-G. 

NAME: Adenylosuccinate synthetase active site. 
CONSENSUS: G-I-[GR]-P-x-Y-x(2)-K-x<2)-R. 
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NAME: Argininosuccinatc synthase signature 1. 
CONSENSUS: A-[FY]-S-G-G-L-D-T-S. 

NAME: Argininosuccinate synthase signature 2. 
CONSENSUS: G-x-T-x-K-G-N-D-x(2)-R-F. 

NAME: Phosphoribosylglycinamide synthetase signature. 
CONSENSUS: R-F-G-D-P-E-x-[QMl. 

NAME: Carbamoyl-phosphate synthase subdomain signature 1. 

CONSENSUS: [FYV]-[PS]-[LIVMC]-LLIVMA)-tLIVM}-[KR]-[PSA]-[STA]-x(3>-[SG}-G-x-[AG). 
NAME: Carbamoyl-phosphate synthase subdomain signature 2. 

CONSENSUS: [UVMFl-tLIMN]-E-[UVMCA]-N-[PATLIVM]-tKR]-[LIVMSTAC]. 

NAME: ATP-dependent DNA ligase AMP-binding site. 
CONSENSUS: [EDQH]-x-K-x-[DN]-G-x-R-[GAOVM]. 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-[LIVMA]-[LIVM](2)-[KR]-x(5,8)-[YW]-[QNEK]-x(2,6)-[KRH]-x(3.5)-K- 
CONSENSUS: [UVMFYJ-K. 

NAME: NAD-dependent DNA ligase signature 1 . 

CONSENSUS: K-[LIVM]-D-G-[LIVM]-[SA]-x(4)-Y-x(2)-G-x-L-x(4)-tSTl-R-G-[DN]-G-x(2)-G- 
CONSENSUS: [DE]-[DENL]. 

NAME: NAD-dependent DNA ligase signature 2. 

CONSENSUS: [IV]-G-tKR]-[ST]-G-x-tUVM]-[STNK]-x-[VT]-x(2)-L-x-[PS]-V. 

NAME: RNA 3' -terminal phosphate cyclase signature. 
CONSENSUS: [RH]-G-x(2)-P-x-G(3)-x-[LIV]. 

NAME: Li poate -protein ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-[FYW]-H-x(2)-[GH]-Q-x-[LIV]-x-Y. 

NAME: Isopenicillin N synthetase signature 1 . 
CONSENSUS: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL]. 

NAME: Isopenicillin N synthetase signature 2. 

CONSENSUS: [UVM](2)-x-C-G-[STA]-x(2)-[STAG]-x(2)-T-x-[DNG]. 

NAME: Site-specific recombinases active site. 
CONSENSUS: Y-[LIVAC]-R-[VA]-S-[ST]-x(2)-Q. 

NAME: Site-specific recombinases signature 2. 

CONSENSUS: G-[DE]-x(2)-[LIVM]-x(3)-[LIVM]-pTJ-R-[LIVM]-[GSA] . 
NAME: Transposases, Mutator family, signature. 

CONSENSUS: I>x(3>^-[LIVMFI-x(6)-[STAV]-[LIVMFYW]-[PTl-x-[STAV]-x(2)-[QRJ-x-C-x(2)- 
CONSENSUS: H. 

NAME: Transposases, IS30 family, signature. 

CONSENSUS: R-G-x(2)-E-N-x-N-G-[LIVM](2)-R-[QE]-tLIVMFY]{2)-P-K. 
NAME: Auto inducers synthetases family signature. 

CONSENSUS: [LMFY)-R-x(3)-F-x(2)-[KR]-x(2)-W-x-[LIVM]-x(6,9)-E-x-D-x-[FY]-D. 
NAME: Thiamine pyrophosphate enzymes signature. 

CONSENSUS: [LIVMF]-[GSA]-x(5)-P-x(4)-[LIVMFYW]-x-[LrVMF]-x-G-D-tGSA]-[GSAC]. 
NAME: Biotin-requiring enzymes attachment site. 

CONSENSUS: [GN]-[DEQTRJ-x-lUVMIT]-x(2)-[UVM3-x-[AIVl-M-K-[U4AT]-x(3)-[LIVMl-x- 
CONSENSUS: [SAV]. 

NAME: 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site. 
CONSENSUS: [GN]-x(2)-[LIVFJ-x(5)-[UVFC]-x(2)-[LrVFA]-x(3)-K-[STAIV]-[STAVQDN]- 
CONSENSUS: xaMLIVMFS]-x(5)-[GCNJ-x-[LIVMFY]. 

NAME: Putative AMP-binding domain signature. 

CONSENSUS: [UVMFY]-x(2)-[STG]-[STAG]-G-[STl-[STEI]-[SG]-x-[PASLIVM].[KR]. 

NAME: Molybdenum cofactor biosynthesis proteins signature 1 . 
CONSENSUS: [UVM](3HLIT](2)-G-G.T-G-x(4)-D. 
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NAME: Molybdenum cofactor biosynthesis proteins signature 2. 

CONSENSUS: S-x-[GS]-x(2)-D-x(5)-[LIVWJ-x< 10. 12MLIV]-x(2)-[KR]-P-G-[KRL]-P-x(2)- 
CONSENSUS: [LIVMF]-[GA]. 

NAME: moaA / nifB / pqqE family signature. 

CONSENSUS: [LIV].x(3)-C-[NP]-[UVMF]-[QRS)-C-x-[FYMl<:. 

NAME: Radical activating enzymes signature. 

CONSENSUS: [GV]-x-G-x-IKR]-x(3)-F-x(2>-G-x(0,l)-C-x(3)-C-x{2).C-x.[NL]. 

NAME: Tpx family signature. 

CONSENSUS: S-x-D-t-P-F-A-x(2>[KR]-[FW]-C. 

NAME: Cytochrome c family heme -binding site signature. 
CONSENSUS: C-{CPWHF}-{CPWR}-C-H-{CFYW}. 

NAME: Cytochrome b5 family, heme-binding domain signature. 
CONSENSUS: [FY]-[UVMK]-x(2)-H-P-[GAJ-G. 

NAME: Cytochrome b/b6 heme-ligand signature. 

CONSENSUS: [DENQ]-x(3)-G-[FYWMQ]-x-(LIVMF)-R-x(2)-H. 

NAME: Cytochrome b/b6 Qo site signature. 
CONSENSUS: P-[DE]-W-[FYMLFY](2). 

NAME: Cytochrome b559 subunits heme-binding site signature. 
CONSENSUS: [LIV]-x-lSTHLIVfl-R-[F^-x(2)-[I^ 

NAME: Nickel-dependent hydrogenases b-rype cytochrome subunit signature 1 . 
CONSENSUS: R-[LIVMFYW]-x-H-W-[LIVM]-x(2)-[U^ 

NAME: Nickel-dependent hydrogenases b-type cytochrome subunit signature 2. 

CONSENSUS: [RH]-[STA]-[LIVMFYW]-H-[RH]-[LIVM]-x(2)-W-x-[LIVMFl-x(2)-F-x(3)-H. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 1. 

CONSENSUS: R-P-[LIVMT]-x(3)-[UVM]-x(6HUVMWPK]-x(4).S-x(2)-H-R-x-[ST]. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 2. 
CONSENSUS: H-x(3)-[GAJ-[LIVMT]-R-[HFl-[LIVMF]-x-[FYWMl-D-x-lGVA]. 

NAME: Thioredoxin family active site. 

CONSENSUS: [LIVMF]-[LIVMSTA]-x-[LIVMFYC]-[FYWSTHE]-x(2)-tFYWGTN]-C-tGATPLVE]- 
CONSENSUS: [PH YWSTA]-C-x(6)-[UVMFYWT] . 

NAME: Glutaredoxin active site. 

CONSENSUS: [LIVDl-[FYSA]-x(4)-C-[PV]-[FYW]-C-x(2>-[TAV]-x(2,3)-[LIV]. 
NAME: Type-1 copper (blue) proteins signature. 

CONSENSUS: [GA]-x(0,2MYSA]-x<0, l)-[VFY]-x-C-x< l ,2MPG]-x(0, l)-H-x(2,4)-[MQJ. 

NAME: 2Fe-2S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-{C}-{C}-[GAJ-{C}-C-[GAST]-{CPDEKRHFYW}-C. 

NAME: Adrenodoxin family, iron-sulfur binding region signature. 
CONSENSUS: C.x(2)-[STAQ]-x-[STAMV]-C-[STA]-T-C-[HR]. 

NAME: 4Fe-4S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-x(2)-C-x(2)-C-x(3)-C-[PEG]. 

NAME: * High potential iron-sulfur proteins signature. 
CONSENSUS: C-x(6,9)-[LIVM]-x(3)-G-[YW]-C-x(2HFYW]. 

NAME: Rieske iron-sulfur protein signature 1. 
CONSENSUS: C-[TKJ-H-L-G-C-[UVT] . 

NAME: Rieske iron-sulfur protein signature 2. 
CONSENSUS: C-P-C-H-x-[GSA]. 

NAME: Flavodoxin signature. 

CONSENSUS: [LIV]-[UVFY]-[FY]-x-[ST]^ 

NAME: Rubredoxin signature. 

CONSENSUS: [LIVM]-x(3)-W-x<:-P-x-C-[AGD]. 
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NAME: Electron transfer flavoprotein alpha-subunit signature. 

CONSENSUS: [U]-Y-tUVM]-[AT3-x-G-[IVl-[SD]-G-x-[IV]-Q-H-x(2)-G-x(6)-fIV].x-A- 
CONSENSUS: [IV]-N. 

NAME: Electron transfer flavoprotein beta-subunit signature. 

CONSENSUS: [IVA]-x-[KR]-x(2)-[DEl-[GD]-[GDE]-x( 1 ,2)-[EQ]-x-[LIV]-x(4)-P-x-|L[VM](2)- 
CONSENSUS: [TAC]. 

NAME: Vertebrate metallothioneins signature. 

CONSENSUS: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C.x-C-x(2)-C-x-K. 

NAME: Ferritin iron-binding regions signature 1. 

CONSENSUS: E-x-[KR]-E-x(2)-E-[KR].[LF]-[LiVMA3-x(2)-Q-N-x-R-x-G-R. 

NAME: Ferritin iron-binding regions signature 2. 
CONSENSUS: D-x(2)-[LlVMFMSTAC]-[DH>F-[W 

NAME: Bacterioferritin signature. 

CONSENSUS: <M-x-G-x(3)-V-[LIV]-x(2)-[LM]-x(3)-L-x(3)-L. 
NAME: Transferrins signature 1. 

CONSENSUS: Y-x(O t l)-[VAS]-V-[IVAC]-[IVA}-[IVA]-[RKH]-[RKS]-[GDENSA]. 
NAME: Transferrins signature 2. 

CONSENSUS: Y-x-G-A-[FL]-[KRHNQ]-C-L-x(3,4)-G-[DENQ]-V-[GA]-[FYW]. 
NAME: Transferrins signature 3. 

CONSENSUS: [DENQ]-[YF)-x-[LY]-L-C-x-[DN]-x(5,8)-tUV]-x(4,5K:-xC2)-A-x(4)-[HQR]-x- 
CONSENSUS: [LIVMFYW)-[LIVM]. 

NAME: Globins profile. 

NAME: Protozoan/cyanobacterial globins signature. 

CONSENSUS: F-[LF]-x(5)-G-[PAl-x(4)-G-[KRA]-x-[LIVM]-x(3)-H. 

NAME: Plant hemoglobins signature. 
CONSENSUS: [SN]-P-x-L-x(2)-H-A-x(3)-F. 

NAME: Hemerythrins signature. 
CONSENSUS: W-L-x-[NQ]-H-I-x(3)-D-F. 

NAME: Arthropod hemocyanins / insect LSPs signature 1 . 
CONSENSUS: Y-[FYW]-x-E-D-[LIVM]-x(2)-N-x(6)-H-x(3)-P. 

NAME: Arthropod hemocyanins / insect LSPs signature 2. 
CONSENSUS: T-x(2)-R-D-P-x-[FY)-[FYW] . 

NAME: Heavy-metal-associated domain. 

CONSENSUS: [UVN]-x(2^[LIVMFA]-x-C-x-[STAGCDNH]<:-x(3)-[UVFG]-x(3)-[LIV]-x(9J 1)- 
CONSENSUS: (IVA]-x-lLVFYS]. 

NAME: ABC transporters family signature. 

CONSENSUS: [LIVMFYC]-[SA]-{SAPGLVFYKQH]-G-[DENQMW3-[KRQASPCLIMFWJ-[KRNQSTAVM]- 
CONSENSUS: [!OlACLVM]-[LIVMFn»AN]-{PHY}-[LIVMFW]-[SAGCLIVP]-{FYWHP}-(KRHP}- 
CONSENSUS: [LIVMFY WSTA] . 

NAME: Binding-protein-dependent transport systems inner membrane comp. sign. 

CONSENSUS: [LIVMFY]-x(8)-tEQR]-[STAGV]-[STAG]-x(3)-G-[LIVMFYSTAC]-x(5HLIVMFYSTA}- 
CONSENSUS: x(4)-[UVMFY]-[PKR]. 

NAME: ABC-2 type transport system integral membrane proteins signature. 
CONSENSUS: [LIMST>x(2)-[LIMW)-x(2)-[LlMCA^^ 
CONSENSUS: x(9, 12)-P-[LIMFT].x-[HRSY]-x<5HRQ]. 

NAME: Bacterial extracellular solute-binding proteins, family 1 signature. 
CONSENSUS: [GAI^-[lJVMFAHSTAVDr^-x(4MGSAVHUVM 
CONSENSUS: [KNDEJ. 

NAME: Bacterial extracellular sohite-binding proteins, family 3 signature. 
CONSENSUS: G-[FYILHDEHUVMT]-[DE]-[LIVMFW 

NAME: Bacterial extracellular solute-binding proteins, family 5 signature. 
CONSENSUS: [AG]-x(6,7>[DNEG]-x(2MSTAVE]-[UVMFYWA] 
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CONSENSUS: [KRHDE]-[GDN]-[LIVMA]-[KNGSP]-[FW1. 
NAME: Serum albumin family signature. 

CONSENSUS: [FY]-x(6)-C-C.x(7)-C-[LFY]-)t(6)-[LIVMFYW]. 

NAME: Transthyretin signature 1 . 

CONSENSUS: S-K-C-P-L-M-V-K-V-L-D-[AS]-V-R-G. 

NAME: Transthyretin signature 2, 

CONSENSUS: S-P-[FY]-S-[FY]-S-T-T-A-[LIVM]-V-[ST|.x-P. 
NAME: Avidin / Streptavidin family signature. 

CONSENSUS: [DEN]-x(2)-tKR]-[STA]-x(2)-V-G-x-[DN]-x-[FW]-T-[KR]. 

NAME: Eukaryotic cobalamin-binding proteins signature. 
CONSENSUS: [SN]-V-D-T-[GA]-A-[UVM]-A-x-L-A-[LIVMFJ-T-C. 

NAME: Lipocalin signature. 

CONSENSUS: [DENGl-x-[DENQGSTARK]-x(0,2)-[DENQARK]-[LIVFY]-{CP}-G-{C}-W-[FYWLRH]-x- 
CONSENSUS: [LIVMTAj. 

NAME: Cytosolic fatty-acid binding proteins signature. 
CONSENSUS: [GSAIVKJ-x-[FYW]-x-[LIVMfl-x(4MNHG> 
CONSENSUS: [LIVMAKR] . 

NAME: Acyl-CoA-binding protein signature. 
CONSENSUS: P-[STA]-x-rDEN]-x-[UVMn-x(2)-[L^ 

NAME: LBP / BPI / CETP family signature. 

CONSENSUS: [PA]-[GA]-[LIVMC]-x(2)-R-[IV].[STI-x(3)-L-x(5)-[EQ]-x(4)-|LIVM]-[EQK]- 
CONSENSUS: x(8)-P. 

NAME: PhosphaUdylethanoIamine-binding protein family signature. 
CONSENSUS: [FY]-x-[LIVMF](3)-x-[DC3-P-D-x-P-[SN]-x(10)-H. 

NAME: Plant lipid transfer proteins signature. 

CONSENSUS: [UVM]-[PA]-x(2K-x-[LIVM]-x-[LIVM]-x-[UVMFY]-x-[UVM] 
CONSENSUS: [DN]-C-x(2)-[LIVM]. 

NAME: Uteroglobin family signature 1 . 

CONSENSUS: [GA]-x(3)-I^C-P-x-[LIVMF]-x(3)-lLIVM]-[DE3-x-[LrVMF](2). 
NAME: Uteroglobin family signature 2. 

CONSENSUS: [DEQ]-x(4)-[SN]-x(5)-[DEQ]-x-I-x(2)-S-[PSE]-[LS]-C. 

NAME: Mitochondrial energy transfer proteins signature. 

CONSENSUS: P-x-[DE]-x-[LIVAT]-tRK]-x-[LRH]-[LIVMFV]-[QMAIGV]. 

NAME: Sugar transport proteins signature 1 . 

CONSENSUS: [LIVMSTAGl-tLrVMFSAGl^-lLIVMSAl-lDEl-x-rLIVMFYWAl-G-R-tRK]^^)- 
CONSENSUS: (GSTAJ. 

NAME: Sugar transport proteins signature 2. 

CONSENSUS: [LIVMFl-x-G-[UVMFA]-x(2)-G-x(8)-[LIFY]-x(2>-[EQ]-x(6)-[RK]. 

NAME: LacY family proton/sugar symporters signature 1 . 
CONSENSUS: G-[LIVM](2)-x-D-[RK]-L-G-L-[RK](2)-x-[LIVM](2)-W. 

NAME: LacY family proton/sugar symporters signature 2. 

CONSENSUS: P-x-[LIVMF](2)-N-R-[LIVM]-G-x-K-N-[STA]-[LIVM](3). 

NAME: PTR2 family proton/oligo pep tide symporters signature 1. 

CONSENSUS: [GA]-[GAS]-[UVMFWA]-tLIVM]-[GAS3-D-x-[LIVMFYWT]-[LlVMFW]^^ 
CONSENSUS: [IVj-x(3)-[GSTAV)-x-[UVMF]-x(3)-[GA3. 

NAME: PTR2 family proton/oligopeptidc symporters signature 2. 
CONSENSUS: [FYTM2MUtfFY]-[ITV]-[LlVM 

NAME: Amiloride-sensiuve sodium channels signature. 
CONSENSUS: Y-x(2)-[EQTF]-x-C-x(2)-[GSTDNL]<^^ 

NAME: Sodium:alanine symporter family signature. 

CONSENSUS: G-G-x-tGA](2)-[LIVM]-F-W-M-W-[LrVM].x-[STAV]-[LIVMFA](2Ki. 
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NAME: Sodium:dicarboxylate symporter family signature 1. 

CONSENSUS: P.x(0 t l)-G4DE]-x-[lJVMFJ(2^x-[LIVM](2)-[KREQ]-[LIVM](3)-x-P. 
NAME: Sodium :dicarboxy late symporter family signature 2. 

CONSENSUS: P-x^-x-[STA]-x-[NT]-[LIVMC]-D-G-[STAN]-x-[LIVM]-[FYl-x(2)-[LIVM]-x(2)- 
CONSENSUS: [LIVM]-[FY]-[UMSA]-Q. 

NAME: Sodium:galactoside symporter family signature. 

CONSENSUS: r>x(3)-G-x(3)-PN]-x(6,8)-G-[KH]-F-[KR]-P-[FYW]-[LIVM](2)-x-[GSTA)(2). 

NAME: Sodium:neurotransmitter symporter family signature 1. 
CONSENSUS: W-R-F-[GP]-Y-x(4)-N-G-G-G-x-[FY]. 

NAME: Sodium: neurotransmitter symporter family signature 2. 

CONSENSUS: Y-[LIVMFY]-x(2HSC]-[LIVMFY]-tSTQ]-x(2>-L-P-W-x(2)-C-x(4>-N-[GST]. 

NAME: Sodium:solute symporter family signature 1. 
CONSENSUS: [GSl-x(2)-[LIY]-x(3)-[UVMFmSTAG](10)-^ 
CONSENSUS: [SAP]. 

NAME: Sodium:solute symporter family signature 2. 

CONSENSUS: [GAST3-[LrVM]-x(3)-[KR]-x(4)^-A.x(2>[GAS]-[LIVMGS]-[LIVMW]-[UVMGAT)-G- 
CONSENSUS: x-[LIVMG]. 

NAME: Sodium:sulfate symporter family signature. 

CONSENSUS: [STACP]-S-x(2)-F-x(2>-P-[LIVMl-[GSA]-x(3)-N-x-[UVM]-V. 

NAME: glpT family of transporters signature. 
CONSENSUS: R-G-x(5)-W-N-x(2)-H-N-x-G-G. 

NAME: Ammonium transporters signature. 

CONSENSUS: D-[rWS]-A^-[GSC)-x(2)-[IV].x(3)-[SAG3(2)-x(2)-tSAG]-[LIVMF3-x(3)- 
CONSENSUS: [LIVMFYWA](2)-x-[GK]-x-R. 

NAME: BCCT family of transporters signature. 
CONSENSUS: [GSDN]-W-T-[LIVM]-x-[FY]-W-x-W-W. 

NAME: Flagellar motor protein motA family signature. 
CONSENSUS: A-[LMF]-x-[GATl-T-(LIVFl-x-G-x-tLIVMF].x(7)-P. 

NAME: Formate and nitrite transporters signature 1 . 

CONSENSUS: [LIVMA3-[LIVMY].x-G-[GSTA]-[DES]-L-EFI]-tTN]-[GS3. 

NAME: Formate and nitrite transporters signature 2. 
CONSENSUS: [GA]-x(2)-[CA]-N-[LIVMFYW3(2)-V-C-[LV]-A. 

NAME: Prokaryottc sul fate-binding proteins signature 1 . 
CONSENSUS: K-x-[NQEK]-tGTJ-G-[DQ]-x-[LIVM]-x(3)-Q-S. 

NAME: Prokaryotic sul fate -binding proteins signature 2. 
CONSENSUS: N-P-K-[ST|-S-G-x-A-R. 

NAME: Sulfate transporters signature. 

CONSENSUS: P-x-Y-[GS]-l^Y-[STAG](2)-x(4)-[LIVMFY](3)-x(3).[GSTA3(2)-S-[KRl. 
NAME: Amino acid permeases signature. 

CONSENSUS: [STAGC]<!-[PAG]-x(2.3V[LiVMFYWA](2)-x-[LIVMFYW]-x-[LIVMFWSTAGC](2)- 
CONSENSUS: [STAGC]-x(3)-[LIVMFYW]-x-[LIVMST]-x(3)-[LIMCTA]-[GAJ-E-x(5)-tPSAL]. 

NAME: Aromatic amino acids permeases signature. 

CONSENSUS: I-G-[GA)-G-M-[LF]-[SA)-x-P-x(3HSA]-G-x(2)-F. 

NAME: Xanthine/uracil permeases family signature. 

CONSENSUS: [LIVM]-P.x-[PASIF)-V.[LIVM]-G-G-x(4).[LlVMl-[FY]-[GSA]-x-[LIVM]-x(3)-G. 

NAME: Anion exchangers family signature 1. 

CONSENSUS: F.G.G-[LrVM](2)-(KR]-D-tUVM]-[RK]-R-R-Y. 

NAME: Anion exchangers family signature 2. 
CONSENSUS: [FQ-L-I^-M-F-I-Y-E-T-F-x-K-L. 

NAME: MIP family signature. 

CONSENSUS: [HNQAJ-x-N-P-[STA]-[LIVMF]-[STJ-[LIVMF3-[GSTAFYJ. 



1064 



WO 01/12659 PCT7IB00/01496 
NAME: General diffusion Gram-negative porins signature. 

CONSENSUS: [LIVMFY]-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SN]-tSTAV]-{LIVMFYW]-V. 
NAME: OmpA-like domain. 

CONSENSUS: [LIVMA]-x-[GT]-x-[TA]-[DA3-x(2>-fDG]-[GSTP]-x(2)-(LFYDE].[NQS]-x(2)- 
CONSENSUS: [LI]-[SG]-[QE]-[KRQE]-R-A-x(2)-[LV]-x(3)-[LIVMF)-x(4 t 5)-[LlVM]-x(4)- 
CONSENSUS: [LIVM]-x(3)-[SG)-x-G. 

NAME: Eukaryotic mitochondrial porin signature. 

CONSENSUS: [YH]-x(2)-D-[SPA]-x-[STA]-x(3)-[TAG]-[KR]-[LIVMF].[DNSTA]-[DNS]-x(4>- 
CONSENSUS: [GSTAN]-[LI VMA]-x-[LIVM Y] . 

NAME: Insulin-like growth factor binding proteins signature. 
CONSENSUS: G-C-[GS]-C-C-x(2)-C-A-x(6)-C. 

NAME: GPRl/FUN34/yaaH family signature. 
CONSENSUS: N-P-[AV)-P-[LFJ-G-L-x-[GSA]-F. 

NAME: GNS1/SUR4 family signature. 
CONSENSUS: L-x-F-L-H-x-Y-H-H. 

NAME: 43 Kd postsynaptic protein signature. 
CONSENSUS: G-Q-D-Q-T-K-Q-Q-I. 

NAME: Actins signature 1 . 

CONSENSUS: [FY]-[LIV}-G-[DE]-E-A-Q-x-[RKQ](2)-G. 
NAME: Actins signature 2. 

CONSENSUS: W-[IVJ-[STA]-[RK3-x-[DE]-Y-[DNE]-[DE] . 
NAME: Actins and actin-related proteins signature. 

CONSENSUS: [LM]-[LIVM]-T-E-[GAPQ]-x-[LIVMFYWHQ]-N-[PSTAQ]-x(2).N-[KR]. 
NAME: Annexins repeated domain signature. 

CONSENSUS: [TG]-[STV]-x(8)-[LIVMF3-x(2>-R-x(3)-[DEQNH]-x(7).aFY]-x(7)-[LIVMF]- 
CONSENSUS: x(3)-[LIVMF]-x(l l)-lLIVMFA]-x(2)-[LIVMF]. 

NAME: Caveolins signature. 
CONSENSUS: F-E-D-V-I-A-E-P. 

NAME: Clathrin light chain signature 1. 
CONSENSUS: F-L-A-Q-Q-E-S. 

NAME: Clathrin light chain signature 2. 

CONSENSUS: [KR]-D-x-S-[KR]-[LIVM]-[KR]-x-[LIVM](3)-x-L-K. 

NAME: Clusterin signature 1 . 
CONSENSUS: C-K-P-C-L-K-x-T-C. 

NAME: Clusterin signature 2. 

CONSENSUS: C-L-[RK]-M-[RK]-x-[EQ]-C-[ED3-K-C 
NAME: Connexins signature 1 . 

CONSENSUS: C-PN] T-x-Q-P-G-C-x(2)-V-C-Y-D. 
NAME: Connexins signature 2. 

CONSENSUS: C-x(3,4)-P-C-x(3)-[UVM3-[DEN]-C-[FY]-[LIVM]-[SA]-[KR]-P. 

NAME: Crystallins beta and gamma 'Greek key* motif signature. 
CONSENSUS: [LIVWiTVA)-x-{DEHRKSTP}-[FYMDEQHK 

NAME: Dynamin family signature. 

CONSENSUS: L-P-[RK]-G-[STN]-[GN]-(LIVM]-V-T-R. 

NAME: Dynein light chain type 1 signature. 

CONSENSUS: H-x-I-x-G-[KR]-x-F-[GA]-S-x-V-[STl-[HY]-E. 

NAME: FtsZ protein signature 1. 

CONSENSUS: N-[ST}-D-x-Q-x-L-x(l6 t 18)-G-x-G-[ATV]-G-[GSAN]-x-P.x(2)-G. 
NAME: FtsZ protein signature 2. 

CONSENSUS: [DNHKR]-[LIVMF]-x-[LIVMFl(2)-[VSTAC]-[STAC3-G-x-G-[GK]-G-T-G-[STl-G- 
CONSENSUS: [GSAR]-[STA]-P-[LIVMFT1-[LIVMF1-[SGAV]. 
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NAME: Fungal hydrophobic signature. 

CONSENSUS: [GN]-lDNQPSA]-x-C.[GSTANKl-[GSTADNQ]-[STNQI].[PTIV3-x-C-C-[DENQKPSTl. 

NAME: Intermediate filaments signature. 

CONSENSUS: [IV]-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE] . 

NAME: Involucrin signature. 

CONSENSUS: < M-S-[QH]-Q-x-T-[LV]-P-V-T-[LV] . 
NAME: Kinesin motor domain signature. 

CONSENSUS: [GSAJ-[KRHPSTQVM]-lLIVMFl-x-[LIVMF]-tIVCl-D-L-[AH]-G-[SAN]-E. 
NAME: Kinesin motor domain profile. 
NAME: Kinesin light chain repeat. 

CONSENSUS: [DEQR]-A-L-x(3)-IGEQ]-x(3)-G-x-[DNS]-x-P-x-V-A-x(3)-N-x-L-[ASJ- 
CONSENSUS: x(5)-[QR]-x-[KR]-[FY]-x(2>-[AV]-x(4)-[HKNQ] . 

NAME: Myelin basic protein signature. 
CONSENSUS: V-V-H-F-F-K-N. 



NAME: Myelin PO protein signature. 

CONSENSUS: S-[KR]-S-x-K-[AG]-x-[SA]-E-K-K-[STAJ-K. 

NAME: Myelin proteolipid protein signature 1 . 
CONSENSUS: G-[MV]-A-L-F-C-G-C-G-H. 

NAME: Myelin proteolipid protein signature 2. 

CONSENSUS: C-x-[STJ-x-[DE]-x(3)-[ST]-tFY]-x-L-[FY]-I-x(4)-G-A. 

NAME: Neuromodulin (GAP-43) signature 1. 
CONSENSUS: < M-L-C-C-fLIVMl-R-R. 

NAME: Neuromodulin (GAP-43) signature 2. 
CONSENSUS: S-F-R-G-H-I-x-R-K-K-[LIVM] . 

NAME: Osteopontin signature. 

CONSENSUS: [KQ]-x-tTA]-x(2)-[GAJ-S-S-E-E-K. 

NAME: Peripherin / rom-1 signature. 

CONSENSUS: D-[GS]-V-P-F-[ST]-C-C-N-P-x-S-P-R-P-C. 

NAME: ProfUin signature. 

CONSENSUS: <x(0,1)-[STA]-x(0,1)-W-[DENQH]-x-[YI]-x-PEQJ. 

NAME: Surfactant associated polypeptide SP-C pal m i toy la t ion sites. 
CONSENSUS: I-P-C-C-P-V. 

NAME: Synapsins signature 1 . 
CONSENSUS: L-R-R-R-L-S-D-S. 

NAME: Synapsins signature 2. 
CONSENSUS : G-H-A-H-S-G-M-G-K-V-K. 

NAME: Synaptobrevin signature. 

CONSENSUS: N-[LIVM]-[DENS]-fKL]-V-x-[DEQ]-R-x(2)-[KR]-[LIVM]-[STD 
CONSENSUS: [KR]-[TA]-[DE]. 

NAME: Synaptophysin / synaptoporin signature. 
CONSENSUS: L-S-V-[DE]-C-x-N-K-T. 

NAME: Tropomyosins signature. 
CONSENSUS: L-K-E-A-E-x-R-A-E. 

NAME: Tubulin subunits alpha, beta, and gamma signature. 
CONSENSUS: [SAG]-G-G-T-G-ISA]-G. 

NAME: Tubulin-bcta mRNA auto regulation signal. 
CONSENSUS: < M-R-[DE]-[TL]. 

NAME: Tau and MAP proteins tubulin-binding domain signature. 
CONSENSUS: G-S-x(2)-N-x(2)-H-x-[PA]-tAG]-G(2). 

NAME: Neuraxin and MAP1B proteins repeated region signature. 
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CONSENSUS: lSTAGDN]-Y-x-Y-E-x(2)-[DE]-[KR]-[STAGCIJ. 

NAME: F-actin capping protein alpha subunit signature 1 . 
CONSENSUS: V-H-IFY](2)-E-D-G-N-V. 

NAME: F-actin capping protein alpha subunit signature 2. 
CONSENSUS: F-K-[AE]-L-R-R-x-L-P. 

NAME: F-actin capping protein beta subunit signature. 
CONSENSUS: C-D-Y-N-R-D. 

NAME: Vinculin family talin-binding region signature. 

CONSENSUS: [KR]-x-[LIVMFI-x(3)-[LIVMA]-x(2)-[LlVM]-x(6)-R-Q-Q-E-L. 

NAME: Vinculin repeated domain signature. 
CONSENSUS: [LIVM]-x-tQA]-A-x(2)-W-[IL]-x-[DN]-P. 

NAME: Amytoidogenic glycoprotein extracellular domain signature. 
CONSENSUS: G-[VT>E-[FY]-V-C-C-P. 

NAME: Amyloidogenic glycoprotein intracellular domain signature. 
CONSENSUS: G-Y-E-N-P-T-Y-[KR]. 

NAME: Cadherins extracellular repeated domain signature. 
CONSENSUS: [LIV]-x-[LIV]-x-D-x-N-D-[NH]-x-P. 

NAME: Insect cuticle proteins signature. 

CONSENSUS: G-x(7)-[DENJ-G-x(6)-Y-x-A-[DNG]-x(2,3)-G-[FY]-x-[AP]. 

NAME: Gas vesicles protein GVPa signature 1 . 
CONSENSUS: [UVM]-x-[DEMLIVMFYT]-tXrVMH^^^ 

NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R-[LTVA](3)-A-[GSl-[LIVMFY]-x-T-x(3)-Y-[AGl. 

NAME: Gas vesicles protein GVPc repeated domain signature. 
CONSENSUS: F-L-x(2)-T-x(3)-R-x(3)-A-x(2)-Q-x(3)-L-x(2)-F. 

NAME: Bacterial microcompartiments proteins signature. 

CONSENSUS: D-x(0,l)-M-x-K-[SAG](2)-x-tIV]-x-[LIVM]-[LIVMA]-[GCS]-x(4HGD]-[SGPD]- 
CONSENSUS: [GA]. 

NAME: Flagella basal body rod proteins signature. 

CONSENSUS: [GTARYQ]-x(9)-[LrVMYSTA}(2)-|GSTA]-[STADEN]-N-[LlVM]-[SAN3-N-x-[SADNFR]- 
CONSENSUS: [STV]. 

NAME: Flagella transport protein fliP family signature 1 . 
CONSENSUS: [PA]-A-[r^-x-[LIVTHSra]-[EQHLIl^ 

NAME: Flagella transport protein fliP family signature 2. 
CONSENSUS: P-[LrVMF]-K-[LiVMF](5)-x-[LIVMA]-[DNGS]-G-W. 

NAME: Plant viruses icosahedral capsid proteins 'S' region signature. 

CONSENSUS: [FYW]-x-fPSTA)-x(7)-G-x-[LIVM]-x-[LlVMl-x-[FYWI]-x(2)-D-x(5)-P. 

NAME: Potexviruses and carlaviruses coat protein signature. 

CONSENSUS: [RK]-[FYW]-A-[GAPJ-F-D-x-F-x(2)-[LV}-x(3)-[GAST|(2). 

NAME: Neurotransmitter-gated ion-channels signature. 
CONSENSUS: C-x-[LIVMFQ]-x-[LIVMFl-x(2)-[FY]-P-x-D-x<3)-C. 

NAME: ATP P2X receptors signature. 

CONSENSUS: G-G-x-[LIVM]-G-tLIVM]-x-aVl-x-W-x-C-[DN]-L-D-x(5)-C-x-P-x-Y-x-F. 
NAME: G-protein coupled receptors signature. 

CONSENSUS: [GSTALIVMIWC]-[GSTANCPDEJ-{EDPKRH}-x(2)-[UVMNQGA]-x(2)-[LIVMFr]- 
CONSENSUS: [GSTANC]-[LrVMFYWSTAC]-[DENH]-R-fFYWCSH]-x(2)-[LIVMl. 

NAME: G-protein coupled receptors family 2 signature 1 . 

CONSENSUS: C-x(3)-[Fi^IV]-D-x(3.4)-C-[FV^-x(2)-[STAGV3-x<8,9)-C-[PFl. 

NAME: G-protein coupled receptors family 2 signature 2. 
CONSENSUS: CKHLMFCA]-[LrVMFTHLr\H-x-[LWFST^^ 
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NAME: G-protein coupled receptors family 3 signature 1. 

CONSENSUS: [LV]-x-N-[LIVM](2)-x-L-F-x-I-[PA]-Q-[LIVMl-[STA]-x-lSTA}(3)-[STAN]. 

NAME: G-protein coupled receptors family 3 signature 2. 
CONSENSUS: C-C-[FYW]-x-C-x(2K-x(4)-[F^ 

NAME: G-protein coupled receptors family 3 signature 3. 
CONSENSUS: F-N-E-lSTA]-K-x-I-[STAG}-F-fSTJ-M. 

NAME: Visual pigments (opsins) retinal binding site. 

CONSENSUS: [LIVMWAC]-[PGAC]-x(3)-[SAC]-K-[STALIMR]-[GSACPNV]-[STACP]-x(2)-[DENF]- 
CONSENSUS: [AP]-x(2)-0Y]. 

NAME: Bacterial rhodopsins signature 1. 

CONSENSUS: R-Y-x-[DT]-W-x-[LIVMF]-[S*Il-T-P-[LIVM](3). 
NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: [FYIV]-x-[FYVG3-[LIVM3-D-[UVMF]-x-[STA]-K-x(2)-[FY]. 

NAME: Receptor tyrosine kinase class II signature. 
CONSENSUS: [DNJ-[LIV]-Y-x(3)-Y-Y-R. 

NAME: Receptor tyrosine kinase class III signature. 
CONSENSUS: G-x-H-x-N-[LIVM]-V-N-L-L-G-A-C-T. 

NAME: Receptor tyrosine kinase class V signature 1. 

CONSENSUS: F-x-[DN]-x-[GAW]-[GA]-C-[LIVM]-tSA]-tLIVM](2)-[SA]-tLV]-[KRHQ]-[LIVA]- 
CONSENSUS: x(3)-[KR]-C-[PSAW]. 

NAME: Receptor tyrosine kinase class V signature 2. 

CONSENSUS: C-x(2)-[DE]-G-(DEQ)-W-x(2,3)-lPAQ]-[LIVMT]-[GT]-x-C-x-C-x(2)-G-[HFY]- 
CONSENSUS: [EQJ. 

NAME: Growth factor and cytokines receptors family signature 1 . 
CONSENSUS: C-[LVFYR]-x(7,8)-[STIVDN]-C-x-W. 

NAME: Growth factor and cytokines receptors family signature 2. 
CONSENSUS: [STGLJ-x-W-[SG]-x-W-S. 

NAME: TNFR/NGFR family cystcine-rich region signature. 

CONSENSUS: C-x(4,6)-[FYH]-x<5. 10)-C-x<0,2)-C-x(2.3)-C-x(7, » l)-C-x(4,6)-[DNEQSKP]- 
CONSENSUS: x(2)-C. 

NAME: TNFR/NGFR family cysteine-rich region domain. 

NAME: Integrins alpha chain signature. 
CONSENSUS: [FYWS]-[RK]-x-G-F-F-x-R. 

NAME: Integrins beta chain cysteine-rich domain signature. 
CONSENSUS: C-x-[GNQ]-x(l ,3)-G-x-C-x-C-x(2)-C-x-C. 

NAME: Natriuretic peptides receptors signature. 
CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W. 

NAME: Photosyntheric reaction center proteins signature. 

CONSENSUS: [NH]-x(4)-P-x-H-x(2)-[SAG]-x(ll)-[SAGC]-x-H-tSAG](2). 

NAME: Antenna complexes alpha subunits signature. 

CONSENSUS: [LIVFAG]-x.[GASV]-[LIVFA]-x-[IV]-H.x(3)-[LIVM]-[GSTAE]-[STANH]-x(l f 3)- 
CONSENSUS: [STN]-W-[LIVMFYW]. 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: [EQJ-x(4>-H-x(5)-[GSTA]-x(3)-[FYl-x{3)-[AGl-x(2)-[AV]-H-x(7)-P. 

NAME: Photosystem I psaA and psaB proteins signature. 
CONSENSUS: C-D-G-P-G-R-G-G-T-C. 

NAME: Photosystem I psaG and psaK proteins signature. 

CONSENSUS: G-F-x-[LIVM]-x-pEA]-x(2)-[GA]-x-[GTA]-[SA]-x-G-H-x-[UVM]-[GA]. 

NAME: Phytochrome chromophore attachment site signature. 
CONSENSUS: [RGS]-[GSA]-[PV]-H-x-C-H-x(2)-Y. 

NAME: Phytochrome chromophore attachment site domain profile. 
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NAME: Speracc receptor repeated domain signature. 

CONSENSUS: G-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G. 
NAME: TonB-depertdent receptor proteins signature 1 . 

CONSENSUS: <x(10,115)-[DENF]-[ST]-[UVMF]-[LIVSTEQ3-V-x-[AGP]-[STANEQPK]. 
NAME: TonB-depertdent receptor proteins signature 2. 

CONSENSUS: [LYGSTANE]-x(3)-[GSTAENQ]-x-(PGE]-R-x-[LIVFYWA]-x-[LIVMFTAHSTAGNQ]- 
CONSENSUS: [LIVMFYGTA]-x-[LI VMFYWGTADQ]-x-F > . 

NAME: Transmembrane 4 family signature. 

CONSENSUS: G.x(3)-[UVMF]-x(2)-[GSA]-[LIVMF](2).G-C.x.[GA]-[STA].x(2HEGl-x(2)- 
CONSENSUS: [CWN]-[LIVM](2). 

NAME: Bacterial chemotaxis sensory transducers signature. 

CONSENSUS: R-T-E-tEQ]-Q-x(2)-[SA3-[LIVM]-x-[EQ]-T-A-A-S-M-E-Q-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1 . 
CONSENSUS: G-I-S-x-[KR)-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y. 

NAME: ER lumen protein retaining receptor signature 2. 
CONSENSUS: L-E-[SA]-V-A-I-[LM]-P-Q-L. 

NAME: Ephrins signature. 

CONSENSUS: [KRQ]-[LF]-(CST]-x-K-lIFJ-Q-x-[FY]-[STI-tPA]-x(3)-G.x-E.F-x(5)-[FY](2)- 
CONSENSUS: x(2)-[SAJ. 

NAME: Granulins signature. 

CONSENSUS: C-x.D-x(2>-H-C-C-P-x(4)-C. 

NAME: HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-[STAGP]-x(6,7)-pE]-C-x-[FM)-x-E-x(6)-Y. 
NAME: PTN/MK heparin-binding protein family signature 1. 

CONSENSUS: S-[DE]-C-x-[DE].W-x-W-x(2)-C-x-P-x-[SN]-x-D-C-G-[LIVMA]-G-x.R-E-G. 
NAME: PTN/MK heparin-binding protein family signature 2. 

CONSENSUS: C-[KR]-{LIVM]-P-C-N-W-K-K-x-F-G-A-[DE]<;-K.Y-x-F-[EQ]-x-W-G-x-C. 

NAME: Nerve growth factor family signature. 

CONSENSUS: G-C-[KR3-G-[LIV].[DE]-x(3)-[YWl-x-S-x-C. 

NAME: Platelet-derived growth factor (PDGF) family signature. 
CONSENSUS: P-[PS]-C-V-x(3)-R-C-[GSTA]-G-C-C. 

NAME: Small cytokines (intercrine/chemokine) C-x-C subfamily signature. 

CONSENSUS: C-x-C-[LIVM]-x(5,6)-[LIVMFYJ-x(2)-[RKSEQ3-x-[LIVM]-x(2)-[LIVM]-x(5)- 

CONSENSUS: [SAG]-x(2)-C-x(3HEQ]-[LIVM](2)-x(9J0)-C-L-[DN]. 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily signature. 

CONSENSUS: C-C-[LIFY^- x (5,6)-[LI]-x(4).[LrVMF)-x(2)-[r^V]-x(6 t 8K-x(3 1 4)-[SAG3. 

CONSENSUS: [LIVM](2)-[FL]-x(8)-C-[STA]. 

NAME: TGF-beta family signature. 

CONSENSUS: [LIVM]-x(2)-P.x(2)-[FY]-x(4)-C-x-G-x-C. 
NAME: TNF family signature. 

CONSENSUS: [LV]-x.[LIVM]-x(3)-G-[LIVMF]-Y-[LIVMFY](2)-x(2)-[QEKHL]-[LIVMGT|-x- 
CONSENSUS: [LIVMFY]. 

NAME: TNF family profile. 

NAME: Wnt-1 family signature. 

CONSENSUS: C-K-C-H-G-[LIVMT]-S-G-x-C. 

NAME: Interferon alpha, beta and delta family signature. 

CONSENSUS: [FYH]-[ITn-x-[GNRC]-[UVM]-x(2)-[FYl-L-x(7)-[CY]-A-W. 

NAME: Granulocyte-macrophage colony-stimulating factor signature. 
CONSENSUS: C-P-[LP]-T-x-E-[STJ-x-C. 

NAME: Interleuxin-1 signature. 

CONSENSUS: [FCJ-x-S-[ASLV]-x(2)-P-x(2)-[FYLIV]-[U]-[SCA]-T-x(7)-[LrVM]. 
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NAME: lnterleukin-2 signature. 

CONSENSUS: T-E-[LF]-x(2)-L-x-C-L-x(2)-E-L. 

NAME: Interleukins -4 and -13 signature. 

CONSENSUS: L-x-E-[LIVM](2).x(4,5>[LIVM]-tTL]-x(5 f 7)-C-x(4)-[IVA]-x-[DNS]-{LIVMA]. 

NAME: lnterleukin-6 / G-CSF / MGF signature. 
CONSENSUS: C-x(9)-C-x(6)-G-L-x(2)-[FY]-x(3)-L. 

NAME: Interleukin-7 and -9 signature. 
CONSENSUS: N-x-[LAP]-[SCT]-F-L-K-x-L-L. 

NAME: Interleukin-10 family signature. 

CONSENSUS: [GS]-C-x(2)-[LV]-x(2)-[LIVM](2)-x-F-Y-L-x(2)-V. 
NAME: LIF / OSM family signature. 

CONSENSUS: [PSTl-x(4)-F-[NQ]-x-K-x(3K*x-[LF]-L-x(2).Y-[HK]. 

NAME: Macrophage migration inhibitory factor family signature. 
CONSENSUS: [DE]-P-C-A-x(3)-[LIVM]-x-S-I-G-x.[LIVM]-G. 

NAME: Adipokinetic hormone family signature. 
CONSENSUS: Q-[LV]-[NTl-[FY]-[ST]-x(2)-W. 

NAME: Bombesin-like peptides family signature. 
CONSENSUS: W-A-x-G-[SHMLF]-M. 

NAME: Calcitonin / CGRP / IAPP family signature. 

CONSENSUS: C-[SAGDN]tSTN]-x(0, 1 )-[S A]-T-C-[VMA]-x(3)-[LYF)-x(3)-[LYF]. 

NAME: Corticotropin- re lea sing factor family signature. 
CONSENSUS: [PQ]-x-[LIVM]-S-[UVM]-x(2MPST]-[^ 

NAME: Crustacean CHH/MIH/GIH neurohormones family signature. 
CONSENSUS: C-[DENK]-D-C-x-N-[UV]-[FY3-R-x(7)-C-[KR]-x(2)-C. 

NAME: Erythropoietin / thrombopoeitin signature. 
CONSENSUS: P-x(4)-C-D-x-R-[LIVM](2)-x-[KR]-x<14)-C. 

NAME: Granins signature 1 . 

CONSENSUS: [DEMSN]-L-[SAN]-x(2)-[DEJ-x-E-L. 
NAME: Granins signature 2. 

CONSENSUS: C-[UVM](2)-E-[LrVM](2)^-[DN>^ 
NAME: Galanin signature. 

CONSENSUS: G-W-T-L-N-S-A-G-Y-L-L-G-P-H. 

NAME: Gastrin / cholecystokinin family signature. 
CONSENSUS: Y-x(0, l)-[GD]-[WH]-M-[DR]-F. 

NAME: Glucagon / GIP / secretin / VIP family signature. 

CONSENSUS: [YH]-[STAIVGD]-tDEQ]-[AGF)-[LIVMSTE]-[FYLR].x-[DENSTAK]-[DENSTA]- 
CONSENSUS: [UVMFYG)-x(9)-[KREQL]-[KRDENQL]-[LVFYWG]-[UVQ]. 

NAME: Glycoprotein hormones alpha chain signature 1 . 
CONSENSUS: C-x-G-C-C-[FY]-S-R-A-[FYl-P-T-P. 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-T-x-C-x-C-x-T-C-x(2)-H-K. 

NAME: Glycoprotein hormones beta chain signature 1 . 
CONSENSUS: C-[STAGM]-G-[HFYLK-x-[STl. 

NAME: Glycoprotein hormones beta chain signature 2. 

CONSENSUS: [PA]-V-A-x(2)-C-x-C-x(2)-C-x(4)-[STD]-[DEY]-C-x(6,8)-[PGSTAVM]-x(2)-C. 

NAME: Gonadotropin-releasing hormones signature. 
CONSENSUS: Q-H-[FYW]-S-x(4)-P-G. 

NAME: Insulin family signature. 

CONSENSUS: C-C-{P}-x(2K-[STDNEKPI3-x(3)-(LIVMFS]-x(3)-C. 
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NAME: Natriuretic peptides signature. 
CONSENSUS: C-F-G-x(3)-D-R-I-x(3)-S-x(2)-G-C. 

NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-[LlFY](2)-x-N-[CS]-P-x-G. 

NAME: Neuromedin U signature. 
CONSENSUS: F-[LIVMF]-F-R-P-R-N. 

NAME: Endogenous opioids neuropeptides precursors signature. 

CONSENSUS: C-x(3)-C-x(2)^-x(2)-[KRrn-x(6.7)-[LIrl.tDhq-x(3)^-x-tUVM]-[EQ]-C- 
CONSENSUS: [EQ]-x(8)-W-x(2)<:. 

NAME: Pancreatic hormone family signature. 

CONSENSUS: (FY]-x(3)-[LIVM]-x(2)-Y-x(3)-[LIVMFY]-x-R-x.R-[YF]. 

NAME: Parathyroid hormone family signature. 
CONSENSUS: V-S-E-x-Q-x(2)-H-x(2)-G. 

NAME: Pyrokinins signature. 
CONSENSUS: F-[GSTV]-P-R-L-[G > ]. 

NAME: Somatotropin, prolactin and related hormones signature 1 . 

CONSENSUS: C-x-[ST]-x(2>-[LIVMFY].x-[LIVMSTAJ-P-x(5)-[TALIV]-x(7)-[LlVMFY]-x(6)- 
CONSENSUS: [UVMFY]-x(2)-[STA]-W. 

NAME: Somatotropin, prolactin and related hormones signature 2. 
CONSENSUS: C-[LIVMFY]-x(2)-D-[LrvMrcSTA]^^ 

NAME: Tachykinin family signature. 
CONSENSUS: F-[I VFY]-G-[LM]-M-[G > ). 

NAME: Thymosin beta-4 family signature. 
CONSENSUS: K-L-K-K-T-E-T-Q-E-K-N. 

NAME: Urotensin II signature. 
CONSENSUS: C-F-W-K-Y-C. 

NAME: Cecropin family signature. 

CONSENSUS: W-x(0,2)-tKDN]-x(2)-K-[KRE]-[LI]-E-[RKN] . 

NAME: Mammalian defensins signature. 
CONSENSUS: C-x-C-x(3 f 5)-C-x(7)-G-x-C-x(9)-C-C. 

NAME: Arthropod defensins signature. 

CONSENSUS: C-x(2.3)-[HN]-C-x(3,4>-[GR]-x(2>-G-G-x-C-x(4 t 7)-C-x-C. 
NAME: Cathelicidins signature 1 . 

CONSENSUS: Y-x-[ED]-x-V-x-[RQJ-A-[LIVMA]-[DQG]-x-[LIVMFY]-N-[EQ]. 
NAME: Cathelicidins signature 2. 

CONSENSUS: F-x-[UVM]-K-E-T-x-C-x(IO)-C-x-F-[KR]-[KE]. 

NAME: Endothclin family signature. 
CONSENSUS: C-x-C-x(4)-D-x(2)-C-*(2MFY]-C. 

NAME: Plant thionins signature. 
CONSENSUS: C-C-x(5)-R-x(2)-[FY]-x(2)-C. 

NAME: Gamma-thionins family signature. 

CONSENSUS: [KRJ-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-xX:-x(5)-C-x(3)-C. 
NAME: Snake toxins signature. 

CONSENSUS: G-C-x(l,3)-C-P-x(M0)-C-C-x(2)-[PDEN]. 
NAME: Myotoxins signature. 

CONSENSUS: K-x^-H-x-K-x(2)-H<Nx(2)-K-x(3K^ 
NAME: Scorpion short toxins signature. 

CONSENSUS: C-x(3)-C-x(6,9)-[GAS]-K-C-[IMQT]-x(3)-C-x-C, 

NAME: Heat-stable enterotoxins signature. 
CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C. 
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NAME: Aerolysin type toxins signature. 
CONSENSUS: [KT]-x(2)-N-W-x(2)-T-[DNl-T. 

NAME: Shiga/ricin ribosomal inactivating toxins active site signature. 

CONSENSUS: [LIVMA]-x-[LIVMSTA3(2).x-E-[SAGV]-[STAL]-R-[FY]-[RKNQS]-x-[LIVM]-[EQS]- 
CONSENSUS: x(2)-[UVMF]. 

NAME: Channel forming colicins signature. 
CONSENSUS: T-x(2)-W-x-P-[LIVMFY](3)-x(2>-E. 

NAME: Hok/gef family cell toxic proteins signature. 

CONSENSUS: rLIVMA](4)-C-[LIVMFA}-T-[LIVMA](2)-x(4)-[LIVM)-x-[RG]-x(2)-L-[CY]. 

NAME: Staphylococcal enterotoxin/Streptococcal pyrogenic exotoxin signature 1. 
CONSENSUS: Y-G-G-[LIV]-T-x(4)-N. 

NAME: Staphyloccocal enterotoxin/Streptococcal pyrogenic exotoxin signature 2. 
CONSENSUS: K-x(2)-[LIVl-x(4)-[LIV]-D-x(3)-R-x(2)-L-x(5)-CLIV]-Y. 

NAME: Thiol -activated cytolysins signature. 
CONSENSUS: [RK]-E-C-T-G-L-x-W-E-W-W-[RK]. 

NAME: Membrane attack complex components / perforin signature. 
CONSENSUS: Y-x(6HFY]-G-T-H-[FY]. 

NAME: Pancreatic trypsin inhibitor (Kunitz) family signature. 
CONSENSUS: F-x(3>-G-C-x(6)-[FY]-x(5)-C. 

NAME: Bowman-Birk serine protease inhibitors family signature. 

CONSENSUS: C-x(5,6HDENQKRHSTA]-C-[PASTDH]-[PASTDKl-[ASTDV]-C.[NDKS]-fDEKRHSTAJ-C. 

NAME: Kazal serine protease inhibitors family signature. 
CONSENSUS: C-x(7)-C-x(6)-Y-x(3)-C-x(2,3)-C. 

NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature. 
CONSENSUS: [LIVMl-x-D-x-[EDNTY]-[DG]-[RKHDENQl-x-[UVM]-x(5)-Y-x-[UVM]. 

NAME: Serpins signature. 

CONSENSUS: [LIVMFY]-x-[LIVMFYAC]-[DNQl-[RKHQS]-[PST]-F-[LIVMFY]-[LIVMFYCl-x- 
CONSENSUS: [LIVMFAH]. 

NAME: Potato inhibitor I family signature. 

CONSENSUS: [FY W ].p.rEQH]-ILIV](2)-G-x(2)-[STAGV]-x(2)-A. 

NAME: Squash family of serine protease inhibitors signature. 
CONSENSUS: C-P-x(5)-C-x(2)-D-x-D-C-x(3)-C-x-C. 

NAME: Streptomyces subtilisin-type inhibitors signature. 
CONSENSUS: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L. 

NAME: Cysteine proteases inhibitors signature. 

CONSENSUS: [GSTEQKRV]-Q-[LWT]-[VAF]-[SAGQ]^-x-[UVMNKl.x(2)-[LIVMFY]-x-[LIVMFYAl- 
CONSENSUS: [DENQKRHSIV] . 

NAME: Tissue inhibitors of metal 1 op roteinases signature. 
CONSENSUS: C-x-C-x-P-x-H-P-Q-x-A-F-C. 

NAME: Cereal trypsin/alpha-amylase inhibitors family signature. 

CONSENSUS: C-x(4HSAGD]-x(4)-[SPAL]-[LF]-x(2K-[RHl-x-[LIVMFY](2)-x(3,4)-C. 

NAME: Alpha-2-macroglobulin family thiol ester region signature. 
CONSENSUS: [PG]-x-[GSl-C-[GAJ-E-[EQ]-x-[LIVM]. 

NAME: Disintegrins signature. 

CONSENSUS: C-x(2)-G-x-C^:-x-[NQRS]-C-x-[FM]-x(6)-C-[RK]. 

NAME: Lambdoid phages regulatory protein Cm signature. 
CONSENSUS: E-S-x-L-x-R-x(2)-[KRl-x-L-x(4)-[KR](2)-x(2)-[DE]-x-L. 

NAME: Chaperonins cpn60 signature. 
CONSENSUS: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA] . 

NAME: Chaperonins cpnlO signature. 

CONSENSUS: [LIVMFY]-x-P-[ILT]-x-[DEN]-tKRl-ILIVMFA](3)-tKREQ]-x(8.9)-[SGl-x- 
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CONSENSUS: [UVMFY](3). 
NAME: Chaperonins TCP- 1 signature 1: 

CONSENSUS: [RKEL]-[ST]-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMFl(2). 
NAME: Chaperonins TCP-1 signature 2. 

CONSENSUS: [LIVMh[TS]-tNK]-D-[GA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)-[LIVM]-x-[LIVM]-x- 
CONSENSUS: [SNH]-[PQH}. 

NAME: Chaperonins TCP- 1 signature 3. 

CONSENSUS: Q-[DEK]-x-x-[LIVMGTA]-[GA]-D-G-T. 

NAME: Heat shock hsp20 proteins family profile. 

NAME: Heat shock hsp70 proteins family signature 1. 
CONSENSUS: [IV]-D-L-G-T-[ST)-x-[SCJ. 

NAME: Heat shock hsp70 proteins family signature 2. 

CONSENSUS: [LIVMR-[UVMFY]-[DN]-[LIVMFS]-G-tGSH]-[GS]-tAST]-x(3)-[STI-[LIVM3- 
CONSENSUS: [LIVMFC]. 

NAME: Heat shock hsp70 proteins family signature 3. 
CONSENSUS: [LIVMY]-x-[LIVMF>x-G^-x-[STl-x-^ 

NAME: Heat shock hsp90 proteins family signature. 
CONSENSUS: Y-x-[NQH]-K-[DE]-[IVA]-F-L-R-[ED]. 

NAME: Chaperonins clpA/B signature 1. 

CONSENSUS: D-[An-[SGA]-N-[LIVMF](2)-K-[PT]-x-L-x(2)-G. 
NAME: Chaperonins clpA/B signature 2. 

CONSENSUS: R-[LIVMFY]-r>x^-E-[LIVMFY]-x-E-[KRQ]-x.[STA]-x-[STA]-[KR3-[LIVM]-x-G. 
CONSENSUS: [STA]. 

NAME: Nt-dnaJ domain signature. 

CONSENSUS: [rnn-x(2)-[UVMA]-x(3)-[FYWHNT]-P 

NAME: dnaJ domain profile. 

NAME: CXXCXGXG dnaJ domain signature. 

CONSENSUS: C-pEGSTHKR]-x-C-x-G-x-[GK]-[AGSDM]-x(2)-[GSNKR]-x(4,6)-C-x(2,3)-C-x-G-x-G. 
NAME: grpE protein signature. 

CONSENSUS: [FL]-[DNl-[PHEA]-x(2)-[HM]-x-A-lLIVMTN]-x(16,20)-G-[FY]-x(3)-[DEG]-x(2)- 
CONSENSUS: [LrVM]-[RI]-x-[SA]-x-V-x-[IV]. 

NAME: Bacterial type II secretion system protein C signature. 

CONSENSUS: P-x(6)-F-x(4)-L.x(3)-D-(L[VM]-A-[LIVM]-x-[LIVM]-N-x-[LIVM]-x-L. 
NAME: Bacterial type II secretion system protein D signature. 

CONSENSUS: [GR]-[DEQKG]-[STVM]-[LIVMA](3)-[GA]-G-rLIVMFY]-x(ll).[LIVMl-P- 
CONSENSUS: [LIVMFYWGSJ-[LrVMFHGSAE]-x-[LrV^ 

NAME: Bacterial type II secretion system protein E signature. 
CONSENSUS: [LIVM3-R-x(2)-P-D-x-[UVMl(3)-G-E-[LIVM]-R-D. 

NAME: Bacterial type II secretion system protein F signature. 

CONSENSUS: (KRQ]-[LIVMA]-x(2)-[SAIV]-[LIVM3-x-[TY].P-x(2)-[LIVM].x(3).[STAGV].x(6)- 
CONSENSUS: [LMY}-x(3)-[UVMF](2)-P. 

NAME: Bacterial type II secretion system protein N signature. 
CONSENSUS: G-T-L-W-x-G-x(U>-L-x(4)-W. 

NAME: Bacteria] export FHIPEP family signature. 

CONSENSUS: R-[LIVM].[GSA]-E-V-[GSAJ-A-R-F-tSTVJ-L-D-[GSA]-M-P-G-K-Q-M-[GSAl-I-D- 
CONSENSUS: [GSA]-D. 

NAME: Protein secA signatures. 

CONSENSUS: [IV]-x-[IV]-[SA]-T-[NQ]-M-A-G-R-G-x-D-I-x-L. 

NAME: Protein secY signature 1 . 

CONSENSUS: [GST]-[UVMF](2)-x-[UVM]^-[^ 

CONSENSUS: [LIVMFAT](3)-Q-[UVMFA](2). 
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NAME: Protein sccY signature 2. 

CONSENSUS: [LIVMFYW](2H-[DE]-x-[LIVM^-[STN]-x(2)^-[UVMF]-[GST3-[NST]-G.x-tGST]- 
CONSENSUS: [LIVMF](3). 

NAME: Protein secE/sec61 -gamma signature. 

CONSENSUS: [LIVMFY]-x(2)-[DENQGA]-x(4)-[LIVMTA]-x-[KRV]-x(2)-[KW]-P-x(3)-[SEQ3-x(7)- 
CONSENSUS: [LIVTHLIVGA]-[UVFGASTJ. 

NAME: Gram-negative pili assembly chape rone signature. 
CONSENSUS: [LIVMFYMAPNhx-[DNS]-[KREQ]-E-[STW 
CONSENSUS: x(2)-[LIVM]-P-[PAS]. 

NAME: Fimbria! biogenesis outer membrane usher protein signature. 

CONSENSUS: [VL]-rpASQ]-tPAS]-G-[PAD]-[FY]-x-tLIl-PNQSTAPJ-[DNH]-[LIVMFY]. 

NAME: SRP54-type proteins GTP-binding domain signature. 

CONSENSUS: P-[XJVM]-x-[r^L]-tLIVMAT)-[GS]-x-[GS)-[E(a-x(4)-[LIVMF]. 

NAME: Cytochrome c oxidase assembly factor COXlO/ctaB/cyoE signature. 
CONSENSUS: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G. 

NAME: Cyc I in-dependent kinases regulatory subunits signature 1. 

CONSENSUS: Y-S-x-[KR]-Y-x-[DEK2)-x-[FY]-E-Y-R-H-V-x-[LV]-[PT|-[KRP]. 

NAME: Cyclin-dependent kinases regulatory subunits signature 2. 
CONSENSUS: H-x-P-E-x-H-[IV]-L-L-F-[KR]. 

NAME: Pentaxin family signature. 
CONSENSUS: H-x-C-x-[ST]-W-x-[ST). 

NAME: Immunoglobulins and major histocompatibility complex proteins signature. 
CONSENSUS: [FY]-x-C-x-[VA]-x-H. 

NAME: Prion protein signature 1. 

CONSENSUS: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y. 
NAME: Prion protein signature 2. 

CONSENSUS: E-x-[ED]-x-K-[UVM](2)-x-[KR]-[LIVM](2)-x-tQE]-M-C-x(2)-Q-Y. 
NAME: Cyclins signature. 

CONSENSUS: R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFCl-x(4)-[LIVMFYA]-x(2)- 
CONSENSUS: [STAGC]-[LIVMFYQ]-x-[LIVMFYC}-[LIVMFY]-D-[RKHl-[LIVMFYW). 

NAME: Proliferating cell nuclear antigen signature I . 

CONSENSUS: [GA]-[LIVMF)-x-[LIVMA]-x-[SAV3-[LIVM]-D-x-[NSAE]-[HKR]-[Vi]-x-[LY]- 
CONSENSUS: [VGA]-x-[LIVM]-x-[LIVM]-x(4)-F. 

NAME: Proliferating cell nuclear antigen signature 2. 

CONSENSUS: [RKA3-C-[DE)-tRH]-x(3)-tLIVMF3-x(3)-[UVM]-x-tSGAN]-[LIVMF]-x-K- 
CONSENSUS: [LIVMF](2). 

NAME: Acrin-depolymerizing proteins signature. 
CONSENSUS: P-PE]-x-[SA]-x-[LrA^-[^x-[KR]^ 
CONSENSUS: [KR]. 

NAME: BCL2-like apoptosis inhibitors (spans part of BH3, BH1 and BH2). 

NAME: Apoptosis regulator, Bcl-2 family BHl domain signature. 
CONSENSUS: [L\^-[FT]-x-[GSDMG^x(l,2MNS]-[Y^ 
CONSENSUS: [LIVMF3(2)-x-F-[GSAEHGSARY]. 

NAME: Apoptosis regulator, Bcl-2 family BH2 domain signature. 
CONSENSUS: W-[UM]-x(3)-[GR]-G-[WQ]-[DENSAV]-x-[FLGAl-[LIVFTCl. 

NAME: Apoptosis regulator, Bcl-2 family BH3 domain signature. 

CONSENSUS: [LIVA^^-x(3)-L-[KARQ^x-[IVAL]-G-I^[DESG]-[LIMFV]-[DENSH(M-[LVSHRQ]- 
CONSENSUS: [NSR]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain signature. 
CONSENSUS: [DS]-[rTn-R-[AEHU]-V-x-[KD]-[F^ 
CONSENSUS: [HY]-x-[CW]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain profile. 
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CONSENSUS: [FYl-R-Y-G-x-[DE](2^x-[DE]-[LrVM](2)^.[LIVM]-x-F-x-[RK]-[DEQ]-tLIVM]. 
NAME: AAA-protcin family signature. 

CONSENSUS: [LIVMTI-x-[LIVMT]-[LIVMFl-x-[GATMC]-[ST]-[NS]-x(4)-[LIVMl-D-x-A-[UFA]- 
CONSENSUS: x-R. 

NAME: Ubiquitin domain signature. 

CONSENSUS: K-x(2)-[LIVM]-x-fDESAK].x(3)-[LIVM]-[PA3-x(3)-Q-x-tLIVM]-[LIVMC]- 
CONSENSUS: [LIVMFY]-x-G-x(4MDE3. 

NAME: Ubiquitin domain profile. 

NAME: ADP-ribosylation factors family signature. 

CONSENSUS: [HRQT)-x-[FYWI)-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)-tGSA]-[LIVMFl-x- 
CONSENSUS: [WK]-[LIVM]. 

NAME: GTP-binding nuclear protein ran signature. 
CONSENSUS: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y. 

NAME: SARI family signature. 

CONSENSUS: R-x-[LIVM]-E-V-F-M-C-S-[LIVM](2)-x-|KRQ]-x-G-Y-x-E-[AG]-[FI]-x-W-[LrVM]- 
CONSENSUS: x-Q-Y. 

NAME: Band 7 protein family signature. 

CONSENSUS: R-x(2)-[LIV]-[SAN]-x(6)-[LIV]-D-x(2}-T-x(2)-W-G-[L!V]-[KRH]-[LIV].x- 
CONSENSUS: [KRJ-[LIV]-E-[LIV]-[KR]. 

NAME: Trp-Asp (WD) repeats signature. 

CONSENSUS: [LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LlVMSTAGC]-x(2).[DN]-x(2>- 
CONSENSUS: [LIVMWSTAC]-x-[LIVMFSTAG)-W-[DEN]-[LIVMFSTAGCN]. 

NAME: G-protein gamma subunit profile. 

NAME: Ras GTPase -activating proteins signature. 
CONSENSUS: [GS^-x-[LIVMF]-(FY]-[LIVMFY]-R-[U 
CONSENSUS: [SGAN]-P. 

NAME: Ras GTPase-activating proteins profile. 

NAME: Guanine-nucleotide dissociation stimulators CDC24 family signature. 

CONSENSUS: L-x(2)-[LIVMFW]-L*x(2^P*[LIVM]-x(2)-[LIVM]-x-[KRS]^(2)-L-x-[UVMJ-x^ 

CONSENSUS: [DEQJ-[LIVM]-x(3)-[ST]. 

NAME: Guanine-nucleotide dissociation stimulators CDC25 family signature. 
CONSENSUS: [GAP]-[CT]-V-P-[FY]-x(4)-[UVMFY]-x-[DN].[LIVM]. 

NAME: MARCKS family signature 1 . 
CONSENSUS: G-Q-E-N-G-H-V-[KR]. 

NAME: MARCKS family phosphorylation site domain. 

CONSENSUS: E-T-P-K(5>-x(0 f l>.F-S-F.K-K.x-F-K-L.S-G-x-S-F-K-[KR]-[NS]-[KR]-K-E. 

NAME: Stathmin family signature 1. 

CONSENSUS: P-[KQMKR](2MDE]-x-S-L-[EG]-E. 

NAME: Stathmin family signature 2. 
CONSENSUS: A-E-K-R-E-H-E-[KR]-E-V. 

NAME: GTP-binding elongation factors signature. 

CONSENSUS: r>[KRSTGANQFlTW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]-[IV]-x(2)- 
CONSENSUS: [GSTACKRNQ). 

NAME: Elongation factor 1 beta/beta '/delta chain signature 1 . 
CONSENSUS: [DE]-[DEG]-[DE](2)-[LIVMF]-D-L-F-G. 

NAME: Elongation factor 1 beta/beta '/delta chain signature 2. 
CONSENSUS: V-Q-S-x-D-[LIVM]-x-A-[FWM]-tNQ]-K-(LIVM]. 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1 . 

CONSENSUS: L-R-x(2)-T-[GDQ]-x-[GS]-[LIVMF]-x(0,l)-tDENKAC]-x-K-[KRNEQS]-[AV]-L. 
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NAME: Elongation factor Ts signature 2. 

CONSENSUS; E-fLIVM]-N.[SCV]-[QE]-T-D-F-V.[SA}-[KRN). 
NAME: Elongation factor P signature. 

CONSENSUS: K-x-A-x(4)-G-x(2)-[LrVl.x-V-P-x(2)-[LIV]-x(2)-G. 
NAME: Eukaryotic initiation factor 1A signature. 

CONSENSUS: [IMl-x-G-x-[GS]-[KRH]-x(4)-[CL]-x-D-G-x(2)-R-x(2)-[RH]-I-x-G. 
NAME: Eukaryotic initiation factor 4E signature. 

CONSENSUS: PE]-[Ir^-x(2>-F-[KR]-x{2)-[LIVM3-x-P-x-W-E-[DV]-x(5)-G-G-[KR]-W. 

NAME: Eukaryotic initiation factor 5A hypusine signature. 
CONSENSUS: [PT]-G-K-H-G-x-A-K. 

NAME: Initiation factor 2 signature. 

CONSENSUS: G-x-[LIVM]-x(2)-L.[KR]-[KRHNS]-x.K-x(5)-rLrVM]-x(2)-G-x-[DENl-C-G. 
NAME: Initiation factor 3 signature. 

CONSENSUS: [KRJ-[UVMJ(2)-[DN]-tr^-[GS^-[KR].[LIVMFYS]-x-[FY]-[DEQT]-x(2>-[KR]. 

NAME: Translation initiation factor SUI1 signature. 

CONSENSUS: rLrVM]-tEQ]-[LIVM]-Q-G-[DEN]-[KHQ]-[KRV]. 

NAME: Prokaryotic-type class I peptide chain release factors signature. 
CONSENSUS: [AR]-[STA3-x-G-x-G-G-Q-[HNGCS]-V-N-x(3)-[ST]-A-[IVl. 

NAME: Transcription termination factor nusG signature. 
CONSENSUS: [LIVM)-F-G-IKRW]-x-T-P-[I V]-x-[LIVM] . 

NAME: Calponin family repeat. 

CONSENSUS: [LFVM]-x4W]^[MAS]-G-[STY]-rNT]-CKRQ]-x(2)-[STN]-Q-x-G-x(3.4)-G. 

NAME: CAP protein signature 1. 

CONSENSUS: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E. 

NAME: CAP protein signature 2. 

CONSENSUS: D-[LIVMFY]-x-E-x-[PA]-x-P-E-Q-[LIVMFY]-K. 
NAME: Calreticulin family signature 1. 

CONSENSUS: [KRHN]-x-[DEQN]-[DEQNK]-x(3K:-G-G-[AG]-[FY]-(LIVM]-[KN]-[LIVMFY](2). 

NAME: Calreticulin family signature 2. 
CONSENSUS: [LIVM]<2)-F-G-P-D-x-C-[AG] . 

NAME: Calreticulin family repeated motif signature. 

CONSENSUS: [IV]-x-D-x-[DENSTl-x(2)-K-P-[DEH]-D-W-[DEN]. 

NAME: Calsequestrin signature 1 . 

CONSENSUS: [EQJ-[DE)-G-L-[DN]-F-P-x-Y-D-G-x-D-R.V. 
NAME: Calsequestrin signature 2. 

CONSENSUS: [DEJ-L-E-D-W-[LIVMl-E-D-V-L-x-G-x-[LlVM]-N-T-E-D-D-D. 
NAME: S-100/ICaBP type calcium binding protein signature. 

CONSENSUS: [LIVMFYW](2)-x(2)-[LKl-D-x(3)-[DN]-x(3)-rPNSG]-[FY]-x-[ES]-[FYVC]-x(2)- 
CONSENSUS: [HVMFS]-[LIVMFJ. 

NAME: Hemolysin-rype calcium-binding region signature. 
CONSENSUS: D-x-[LI]-x(4)-G-x-D-x-[U]-x-G-G-x(3)-D. 

NAME: HlyD family secretion proteins signature. 

CONSENSUS: [UVM]-x(2)<i-[LM]-x(3)-[STGAV]-x-[LIVMT]-x-tUVMT]-[GE]-x-[KR]-x- 
CONSENSUS: [LIVMFYW](2)-x-{LIVMFYW](3). 

NAME: P-H protein urydylation site. 
CONSENSUS: Y-[KR]-G-[AS]-[AE]-Y. 

NAME: P-II protein C-terminal region signature. 

CONSENSUS: [ST]-x(3)-G-[DY3-G-[KR]-[IV]-[FW]-rLiVM]-x(2)-aiVM]. 
NAME: 14-3-3 proteins signature 1. 

CONSENSUS: R-N-L-[LIV]-S-[VGHGA>Y-[KN]-N-[IVA]. 
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NAME: 14-3-3 proteins signature 2. 
CONSENSUS: Y-K-[DE]-S-T-L-I-[IM]-Q-L-^ 

NAME: ATP1G1 / PLM / MAT8 family signature. 

CONSENSUS: [DNS3-x-F-x-Y-D-x(2)-[STl-[UVM]-[RQ3-x(2)-G. 

NAME: BTG1 family signature 1. 

CONSENSUS: Y-xa^[HPJ-W-[FYl-[AP]-E-x-P-x-K-G-x-[GA]-[rT)-R-C-[IV]-[RH]-[IV]. 
NAME: BTG1 family signature 2. 

CONSENSUS: [LV]-P-x-[DE]-[U4]-[STl-[LrVM]-W-[IV]-r>P-x-E.V-[SC3-x-[RQ3-x-G-E. 
NAME: Cullin family signature. 

CONSENSUS: [LIV]-K-x{2)-rLIV]-x(2)-L-I-[DEQ]-[KRHNQ]-x-Y-[LIVM3-x-R-x(6,7)-[FY]-x- 
CONSENSUS: Y-x-[SA]>. 

NAME: Cullin family profile. 

NAME: Enhancer of rudimentary signature. 

CONSENSUS: Y-D-I-[SA]-x-L-[FY]-x-F-[IV]r>x(3)-D-[LIV]-S. 

NAME: G10 protein signature 1 . 

CONSENSUS: L-C-C-x-[KR]-C-x(4>-[DE]-x-N-x(4)-C-x-C-R-V.p. 

NAME: G10 protein signature 2. 

CONSENSUS: C-x-H-C-G-C-[KRH]-G-C-[SA] . 

NAME: Glucokinase regulatory protein family signature. 

CONSENSUS: G-[PA]-E-x-tLIV]-[STA]-G-S-[STJ-R-[LIVM]-K-[STGA3(3)-x(2)-K. 
NAME: GTP1/OBG family signature. 

CONSENSUS: D-[LIVM]-P-G-[LIVMl(2)-[DEY].[GN3-A-x(2)-G-x-G. 
NAME: HIT family signature. 

CONSENSUS: [NQA]-x(4MGAV]-x4QF]-x-[LIVM]-x-H-[LW^ 
CONSENSUS: [PSGA]. 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C-L-[LV]-A-x-A-[LVF]-A. 

NAME: Clathrin adaptor complexes medium chain signature 1. 

CONSENSUS: [IVT]-[GSP]-W-R-x(2.3)-[GAD]-x(2)-(HY]-x(2)-N-x4LIVMAF^(3)-D-[LrVM]- 
CONSENSUS: [LIVMT]-E. 

NAME: Clathrin adaptor complexes medium chain signature 2. 
CONSENSUS: [LrV]-x-F-l-P-P-x-G-x4LIVMFH-x-L-x(2)-Y. 

NAME: Clathrin adaptor complexes small chain signature. 
CONSENSUS: [LTVM](2)-Y-[KR]-x(4)-L-Y-F. 

NAME: Ependymins signature 1. 

CONSENSUS: F-E-E-G-x-[UVMF]-Y-[ED]-I-D-x(2)-N-[QE]-S-C-[RKHJ(2). 
NAME: Ependymins signature 2. 

CONSENSUS: [QE]-[LIVMA]-F-x(2)-P-[STA}-[FY]-C-[DE]-[GAJ-rLrVMl-x(2)-[DE)(2). 
NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: [RQ]-x(3)-[LIVMA]-x(2)-[LIVM]-[ESH]-x(2).[UVMT3-x-[DEVM]-[LIVKf]-x(2)- 
CONSENSUS: [LIVM]-[FS}-x(2)-tLIVM].x(3^[LI\ni-x(2)-Q-[GADEQ]-x(2)-rLIVM]-[DNQr|-x- 
CONSENSUS: [LrVMF]-[DESV]-x(2)-[UVM]. 

' NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 1. 
CONSENSUS: [GDER]-H-lFYWH|-T-Q-[UVM](2)-W-x(2)-[STN]. 

NAME: Extracellular proteins SCP/Tpx- 1/Ag5/PR- 1 /Sc7 signature 2. 

CONSENSUS: [UVMFYH]-[LIVMinn-x^-[NQRHS].Y-x-rPARH]-x-[GL]-N-[LrVMFYWDN]. 
NAME: Fetuin family signature 1. 

CONSENSUS: C-x(56)-C-x(10)-C-x(13)-C-x( 17, m-C-x(13)-C-x(2yC-x(5Z>C-x(lQ, 1 1)- 
CONSENSUS: C-x( 10, 12)-C-x< 16.22)-C. 

NAME: Fetuin family signature 2. 
CONSENSUS: L-E T-x-C-H-x-L-D-P-T-P. 
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NAME: Legume lectins beta-chain signature. 
CONSENSUS: [LIV]-[STAG]-V-[DEQV]-[FLI]-D-[ST]. 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS: [LIV]-x-[EDQMFYWKR]- V-x-[LIV]-G-[LFHSn . 

NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: W-[GEK]-x-[EQJ-x-[KRE]-x(3 t 6)-[PCTF]-tLIVMF]-[NQEGSKV]-x-[GH]-x(3)- 
CONSENSUS: [DENKHS] -[LIVMFC] . 

NAME: Lysosome-associated membrane glycoproteins duplicated domain signature. 
CONSENSUS: [STA]^-[UVM]-[LIVMFYW]-A-x-[LIVMFYW]-x(3)-[LIVMFYW]-x(3)-Y. 

NAME: LAMP glycoproteins transmembrane and cytoplasmic domain signature. 

CONSENSUS: C-x(2)-I>x(3.4)-[LIVM](2VP-[U\^]-x-[LIVM)-G-x(2)-[LIVM]-x^-[LIVM](2> 

CONSENSUS: x-[LIVM](4)-A-[FY]-x-[LIVM3-x(2)-[KR]-[RH]-x{l,2)-[STAG](2)-Y-tEQ). 

NAME: Glycophorin A signature. 

CONSENSUS: M-x-[GAC]-V-M-A-G-[LIVM](2). 

NAME: PMP-22 / EMP / MP20 family signature 1 . 

CONSENSUS: [LIVMF](4)-[SA]-T-x(2)-tDNKS]-x-W-x(9,13)-[LIVJ.W-x(2>-C. 

NAME: PMP-22 / EMP / MP20 family signature 2. 

CONSENSUS: [RQl-[AV]-x-M-[IV]-L-S-x-[LI3-x(4)-[GSA]-[LIVMFl(3). 

NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF)-A. 

NAME: Yeast PIR proteins repeats signature. 

CONSENSUS: S-Q-[IV]-[STGNH]-D-G-Q-[LIV]-Q-[AIV]-[STA]. 

NAME: Seminal vesicle protein I repeats signature. 

CONSENSUS: [IVM]-x-G-Q-D-x-V-K-x(5)-[KN]-G-x(3)-[STLV]. 

NAME: Seminal vesicle protein II repeats signature. 
CONSENSUS: [GSA]-Q-x-K-S-[FY]-x-Q-x-K-[SA] . 

NAME: Serum amyloid A proteins signature. 

CONSENSUS: A-R-G-N-Y-[ED]-A-x-[QKR]-R-G-x-G-G-x-W-A. 

NAME: Spermadhesins family signature 1 . 
CONSENSUS: C-G-x(2)-[LI]-x(4)-G-x-I-x(9)-C-x-W-T. 

NAME: Spermadhesins family signature 2. 

CONSENSUS: C-x-K-E-x-fLIVM)-E-[LIVM]-x-[DE]-x(3)-[GS]-x(5)-K-x-C. 

NAME: Stress-induced proteins SRP1/TIP1 family signature. 
CONSENSUS: P-W-Y-[ST](2)-R-L. 

NAME: Glypicans signature. 

CONSENSUS: C-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2)-[FY]-C-x(2)-[LrVM]-x(2)-G-C. 
NAME: Syndecans signature. 

CONSENSUS: IFY]-R-[IM]-[KR]-K(2>D-E-G-S-Y. 
NAME: Tissue factor signature. 

CONSENSUS: W-K-x-K-C-x(2>-T-x-[DEN]-T-E-C-D-[LIVM]-T-D-E. 
NAME: Translattonally controlled tumor protein signature 1. 

CONSENSUS: [IA]-G-[GAS]-N-[PA]-S-A-E-[GDE]-[PAGE]-x(0 f lHDEG]-x-[DEN]-x(2)-tDE). 

NAME: Translationally controlled tumor protein signature 2. 
CONSENSUS: [FLMFY]-[IVT)-G-E-x-[MA]-x(2.5MD^ 
CONSENSUS: [DE). 

NAME: Tub family signature 1. 

CONSENSUS: F-[KHQ]-G-R-V-[ST|-x-A-S-V-K-N-F-Q. 
NAME: Tub family signature 2. 

CONSENSUS: A-F-[AG)-I-[SAC]-(UVM]-[ST]-S-F-x-[GST|-K-x-A-C-E. 

NAME: HOP repeats signature. 
CONSENSUS: H-R-H-R-G-H-x(2HDE](7). 
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NAME: Bacterial icc-nucleation proteins oc tamer repeat. 
CONSENSUS: A-G-Y-G-S-T-x-T. 

NAME: Cell cycle proteins ftsW / rodA / spoVE signature. 

CONSENSUS: [NV]-x(5)-[GTR]-[LIVMA]-x-P-[inXIVM]-x<3-[LIVM]-x(3)-[LIVMFWl(2)-S^ 
CONSENSUS: G-G-[STN]-[SA], 

NAME: Enterobacterial virulence outer membrane protein signature 1. 
CONSENSUS: G-[LIVMFY]-N-[L1VM]-K-Y-R-Y-E. 

NAME: Enterobacterial virulence outer membrane protein signature 2. 
CONSENSUS: [FYW]-x(2)-G-x-G-Y-[KR]-F> . 

NAME: Hydrogenases expression/synthesis hypA family signature. 

CONSENSUS: F-[CSA]-[FY]-[DE]-[LIVA](2)-x(3)-[ST)-[LIVMJ-x(16)-C-x(2)-C-x(12»15)- 
CONSENSUS: C-P-x-C. 

NAME: Hydrogenases expression/synthesis hupF/hypC family signature. 
CONSENSUS: < M-C-[UVHGAHLlVJ-P-x.[QKRJ-[LIV]. 

NAME: Staphylocoagulase repeat signature. 

CONSENSUS: A-R-P-x(3)-K-x-S-x-T-N-A-Y-N-V-T-T-x(2)-[DN]-G-x(3)-Y-G. 

NAME: 1 1-S plant seed storage proteins signature. 

CONSENSUS: N-G-x-[DE](2)-x-[LIVMF]-C-[ST]-x(l 1 ,12)IPAG]-D. 

NAME: Dehydrins signature 1. 

CONSENSUS: S(5)-[DE]-x-[DEJ-G-x(l,2)-G-x(0,l)-[KR](4). 

NAME: Dehydrins signature 2. 

CONSENSUS: [KRMLIM]-K-[DE]-K-[LIM]-P-G. 

NAME: Germin family signature. 

CONSENSUS: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]. 

NAME: Oleosins signature. 

CONSENSUS: tAG]4STj-x(2)4AG]-x(2HLIVM]-[SAD]-T-P-[LIVMF](4).F.S-P-tLIVM](3)- 
CONSENSUS: P-A. 

NAME: Small hydrophilic plant seed proteins signature. 
CONSENSUS: G-[EQ]-T-V-V-P-G-G-T. 

NAME: Pathogenesis- related proteins Betvl family signature. 

CONSENSUS: G-x(2)-[LIVMF]-x(4)-E-x(2)-[CSTAEN]-x(8,9>-[GND]-G.[GS]-[CS]-x(2).K-x<4)- 
CONSENSUS: [FY]. 

NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: [EQ]-G-x-V-Y-C-DT-C-R. 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-[GF]-x-C-x-T-[GA]-D-C-x(l ,2)-G-x(2,3)-C. 
NAME: Mrp family signature. 

CONSENSUS: W-x(2)-[LIVM]-D-tLIVMY](4)-D-x-P-P-G-T-[GS]-D. 

NAME: Glucose inhibited division protein A family signature 1 . 
CONSENSUS: IGS]-P-x-Y-C-P-S-[LIVM]-E-x-K-[UVM]-x-lKR]-F. 

NAME: Glucose inhibited division protein A family signature 2. 
CONSENSUS: A-G^-x-[KH-G-x(2)^-Y-x-E-[SAG](3)^ 

NAME: NOLl/NOP2/sun family signature. 

CONSENSUS: [FV]-D-[KRA]-[LIVMA]-L-x.D-[AV3-P-C-[ST]-[GA]. 
NAME: PET1 12 family signature. 

CONSENSUS: rPN]-x-lDN]-R-x(3>-P-L-[LIV]-E-[UV]-x-tST|-x-P. 
NAME: Protein smpB signature. 

CONSENSUS: [TA]-G-[LIVM]-x-L-x^3-x-E-[LIVM3-CKQ]-[SAJ-[LIVM]. 

NAME: Hypothetical cof family signature 1 . 

CONSENSUS: tUVFYAN]-[LIVMFA]-x<2)-D-[UVMn-^ 
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NAME: Hypothetical cof family signature 2. 

CONSENSUS: fLIVMFC]-G-D-[GSANQ]-x-N-D-x(3)-[LIMFY]-x(2HAV]-x(2)-[GSCP]-x(2)- 
CONSENSUS: [LMP]-x(2)-[GAS]. 

NAME: RIO1/ZK632.3/MJ0444 family signature. 
CONSENSUS: [LIVM)-V-H-[GA].D-L-S-E-[FY]-N-x-[LIVM). 

NAME: SUA5/yciO/yrdC family signature. 

CONSENSUS: [LIVMTA)(3)-[LIVMFYC]-[PG]-T-tDE]-[STA].x-[Fy].[GA]-[LIVMJ.tGS]. 

NAME: Uncharacterized protein family UPF0001 signature. 
CONSENSUS: [FW]-H-[FM]-[IV]-G-x-[LIV]-Q-x-tNKR]-K-x(3)-lLIV]. 

NAME: Uncharacterized protein family UPF0003 signature. 

CONSENSUS: G-x-V-x(2HUV]-x(3)-[SA]-x(6)-D-x(3)-tLIVTl(3)-P-N-x(2).[LIVMFl(2)- 
CONSENSUS: x(5)-N. 

NAME: Uncharacterized protein family UPF0004 signature. 

CONSENSUS: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-[FY]-C-x-[LIVM]-x(4)-G. 
NAME: Uncharacterized protein family UPF0005 signature. 

CONSENSUS: G-[LIVM](2)-[SA]-x(5,8)-G-x(2)-tLIVM]-G-P-x-L-x(4)-[SAGJ-x(4,6)- 
CONSENSUS: [LIVM](2)-x(2)-A-x(3)-T-A-[LIVM3(2)-F. 

NAME: Uncharacterized protein family UPF0006 signature 1 . 
CONSENSUS: [LIVMFY](2)-D-[STA)-H-x-H-[UVMF]-[DN]. 

NAME: Uncharacterized protein family UPF0006 signature 2. 
CONSENSUS: P-[UVM].x-[LIVM]-H-x-R-x-[TA]-x-[DE]. 

NAME: Uncharacterized protein family UPF0006 signature 3. 
CONSENSUS: [LVSA]-[UVA]-x(2)-[LIVM]-^x( 

NAME: Uncharacterized protein family UPF0007 signature. 
CONSENSUS: V-L-[IV]-H-D-[GA]-A-R. 

NAME: Uncharacterized protein family UPF001 1 signature. 
CONSENSUS: S-D-A-G-x-P-x-[UV]-[SN]-D-P-G. 

NAME: Uncharacterized protein family UPF0012 signature. 
CONSENSUS: [GTA]-x(2)-[IVTl-C-Y-D-[LIVM]-x-F-P-x(9>-G. 

NAME: Uncharacterized protein family UPF0015 signature. 

CONSENSUS: [DE]-[LIVMF](3)-R-T-[SG]-G-x(2)-R-x-S-x-[FY]-[LIVM](2)-W-Q. 

NAME: Uncharacterized protein family UPF0016 signature. 
CONSENSUS: E-[LIVM]-G-I>K-T-F-[LIVMF](2)-A. 

NAME: Uncharacterized protein family UPF0017 signature. 

CONSENSUS: D-x(8)-tGN]-[LFY]-x(4)-[DETJ-[LY]-Y-x(3)-[ST]-x(7)-[IVl-x(2)-[PS]-x- 
CONSENSUS: [UVM)-x-[UVM]-x(3MDN]-D. 

NAME: Uncharacterized protein family UPF0019 signature. 

CONSENSUS: L-P-V-[YTJ-[NQL]-F-[ATI-A-G-G-[UV]-A-T-P.A-D-A-A-ILM]. 

NAME: Uncharacterized protein family UPF0020 signature. 
CONSENSUS: D-P-[UVMFl-C-G-[STl-G-x(3)-lLI]-E. 

NAME: Uncharacterized protein family UPF0021 signature. 
CONSENSUS: C-K-x(2)-F-x(4)-E-x(22.23)-S-G-G-K-D. 

NAME: Uncharacterized protein family UPF0023 signature. 
CONSENSUS: D-x-I>E-[UV]-L-x(4)-V-F-x(3)-S-K-G. 

NAME: Uncharacterized protein family UPF0024 signature. 
CONSENSUS: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]. 

NAME: Uncharacterized protein family UPF0025 signature. 
CONSENSUS: D-V-[UV]-x(2)-G-H-lST)-H-x(l2)-[LIVMF]-N-P-G. 

NAME: Uncharacterized protein family UPP0027 signature. 

CONSENSUS: Q-[LIYMl-x-N-x-A-x-[LIVM]-P-x-I-x(6)-[LIVM]-P-D-x-H-x-G-x-G -x(2)-[IV]-G. 
NAME: Uncharacterized protein family UPF0028 signature. 
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CONSENSUS: [GA]-[GS]-G-[GAJ-A-R-G-x-[SA].H-x-G-x(9).[IV1-x.[IV1-D-x(2)-[GA]-G.a-S- 
CONSENSUS: x-G. 

NAME: Uncharacterized protein family UPF0029 signature. 

CONSENSUS: G-x(2HLIVM](2^x(2)-n-IVM]-x(4)-[LIVM]-x(5)-[LIVM](2).x-R-[FYW](2>-G- 
CONSENSUS: G-x(2)-[UVM]-G. 

NAME: Uncharacterized protein family UPF0030 signature. 
CONSENSUS: [GA}-L-1-[LIV]-P-G-G-E-S-T-[STA]. 

NAME: Uncharacterized protein family UPF0031 signature 1. 

CONSENSUS: [SAV]-[IVW]-[LVA]-(LIV]-G-tPNS]-G-L-[GPJ-x-[DENQTJ. 

NAME: Uncharacterized protein family UPF0031 signature 2. 
CONSENSUS: [GA]-G-x-G-D-(TV]-[LTl-[STA]-G-x.[LIVM]. 

NAME: Uncharacterized protein family UPF0032 signature. 

CONSENSUS: Y-x(2)-F-[LlVMA](2)-x-L-x(4)-G-x(2>F-tEQ]-[LIVMF]-P-lUVM]. 

NAME: Uncharacterized protein family UPF0033 signature. 
CONSENSUS: L-[DN]-x(2)-rTAG]-x(2)-C-P-x-P-x-[UVM] . 

NAME: Uncharacterized protein family UPF0034 signature. 

CONSENSUS: [LIVM]-[DNG]-ELIVM]-N-x-G-C-P-x(3)-[LIVMASQl-x(5)-G-(SAC]. 

NAME: Uncharacterized protein family UPF0035 signature. 
CONSENSUS: L-L-T-x-R-[SA]-x(3)-R-x(3)-G-x(3)-F-P-G-G. 

NAME: Uncharacterized protein family UPF0036 signature. 

CONSENSUS: H-x-S-G-H-[GA]-x(3)-[DE]-x(3HLM]-x(5)-P-x(3).[LIVM]-P-x-H-G-[DEl. 

NAME: Uncharacterized protein family UPF0038 signature. 
CONSENSUS: G-x-[LTJ-x-R-x(2)-L-x(4)-F-x(8)-tLIV]-x(5)-P-x-[LIV]. 

NAME: Uncharacterized protein family UPF0044 signature. 

CONSENSUS: L-[STl-x(3)-K-x(3)-[KR]-[SGA)-x-[GA]-H-x-L-x-P-[LIV]-xa)-[L r V3-[GA}- 
CONSENSUS: x(2)-G. 

NAME: Uncharacterized protein family UPF0047 signature. 
CONSENSUS: S-X(2)-[LIV]-x-[LIV]-x(2)-G-x(4)-G-T-W-Q-x-[LIV]. 

NAME: Uncharacterized protein family UPF0054 signature. 
CONSENSUS: H-[GS]-x-L-H-L-[LI]-G-[FYW]-D-H. 

NAME: Uncharacterized protein family UPF0057 signature. 
CONSENSUS: [LIY]-x-[STA]-[UVIH(3)-P-P^^ 

NAME: Hypothetical YER057c/yjjV family signature. 

CONSENSUS: P-[ATJ-R-[SA]-x-fLIVMY3-x(2VtAKl-x-L-P-x(4)-[UVM]-E, 

NAME: Hypothetical hesB/yadR/yfhF family signature. 

CONSENSUS: F-x-[LIVMFYl-x-N-[PGJ-[NSK]-x(4)-C-x-C-[GS]-x-S-F. 

NAME: Hypothetical yabO/yceC/sfhB family signature. 

CONSENSUS: [NHY]-R-fLI]-D-x(2^T-[STJ-G-[LIVMA]-[UVMF](2)-[LIVMFG]-[SGAq. 
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We claim : 

1. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; 
hfbr2_2b5; hibr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; bibr2_72112; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; 
hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; bibr2_82i24;; hfbrl lO; hfbr2_82ml6;; hfbrl_10; 
hfbr2_82m6;; hfbrl_10; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; 
hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; 
hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; 
hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfljall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17flO; htes3_17U7; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
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htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_18U; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

2. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; 
hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; 
hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; hfbrl_10e4; hfbr2_82gl4; 
hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrlJO; hfbr2_82ml6; bibrl_10; 
hfbr2_82m6; hfbrl lO; their complements; and variants thereof. 

3. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16f21; hfbr2_16k22; 
hibr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23f2; ; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; 
hibr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hibr2_312; hfbr2_62nl0; hfbr2_64al 1; hfbr2_64cl6; 
hfbr2_64c4; hibr2_64h6; hfbr2_64i20; hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; 
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hfbr2_72dl3; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; 
hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; and hfbrl_10. 



4. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24al5; 
hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; 
hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; their complements; and 
variants thereof. 

5. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; their complements; and 
variants thereof. 

6. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lall; hmcfl_lc23; 
hmcfl_lel5; hmcfl_lgl3; their complements; and variants thereof. 

7. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcfl_lgl3; their 
complements; and variants thereof. 

8. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hhtes3_ln3; htes3_14g5; 
htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; htes3_15c6; 
htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
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htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3J72kll; 
Htes3_72kl5; htes3J72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; 
htes3J7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

9. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: htes3_14g5; htes3_14pl4; 
htes3_14p7; htes3 J5al3; htes3_15gl4; htes3_15hl; htes3_15jl8; htes3_17fl0; Htes3J8f3; 
htes3_19fl9; htes3 J9jl7; htes3_20c21; htes3_21n23; htes3_22c23; htes3_22nl3; 
Htes3_23nl9; htes3_27ol4; htes3_28dl4; htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; 
htes3_2hl5; htes3_2119; htes3_2m20; htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; 
htes3_35pl7; htes3_4b4; htes3_4fl7; htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; 
htes3_6b21; htes3_6d!6; htes3_72kl 1; htes3_7dl7; htes3J7j8; Htes3_8gll; Htes3_8g5; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

10. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16gl8; hfbr2_2kl4; 
Htes3_35b4; htes3_35p22; htes3_7j3; htes3J7plO; hutel_20mll; their complements; and 
variants thereof. 

11. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_2b5; 
htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; hutel J9g22; hutel_24j6; 
their complements; and variants thereof. 

12. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_2dl5; htes3_35e21; 
hutel_2h3; their complements; and variants thereof. 

13. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23124; hfbr2_2il7; 
hfbr2_41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 
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hflcd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3J72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; their 
complements; and variants thereof. 

14. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_3g8; hfbr2_62ol7; 
hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd2_3ol7; hfkd2_46j20; htes3_17U7; 
htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; htes3_35kl6; htes3_35nl2; 
htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; their complements; and variants 
thereof. 

15. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23bl0; hfbr2_3cl8; 
hfbr2_64al5; hfbr2_6ol7; hfbr2J72bl8; hfbr2J72112; hfbr2_82i24(hfbrl_10)i 
htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; htes3_7p9; 
htes3_8ml0; hutel_1811; their complements; and variants thereof. 

16. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23b21; hfbr2_23nl6; 
hfbr2_2cl7; hfbr2_62bll; hfbr2J78c24; hfbr2_82e4 (hfbrl J0e4); hfbr2J2il7 
(hfbrl_10); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3_15kll; htes3_lcl; hhtes3_ln3; 
htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; htes3_8e24; hutel_20g21; 
hutel_22d2; hutel_22el2; their complements; and variants thereof. 

17. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16il2; hfbr2_16112; 
hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; hfbr2_82c20 (hfbrl_10c20); 
hfbr2_82el7 (hfbrl_10el7); hfbr2J2gl4 (hfbrl_10gl4); hfkd2_24al5; hfkd2_3il3; 
hfkd2_4mll; hmcfl_lall; hmcfl_lel5; htes3_15c6; htes3_2ol3; htes3_27k4; htes3_2hl; 
htes3_35k24; hutel_19fl9; and hutel_24cl9; their complements; and variants thereof. 

18. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_46kl9; hfkd2_47a4; 
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htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; hutel_li2; their complements; and 
variants thereof. 

19. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; hutel_19gl9; hutel_19g22; 
hutel_19hl7; hutel_19jll; hutel_li2; hutel_20t>19; hutel_20g21; hutel_20hl3; 
hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; hutel_22el2; hutel_22n2; 
hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; hutel_24ell; hutel 24j6; 
hutel_2h3; their complements; and variants thereof. 

20. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; hutel_21dl5; hutel_22o2; 
hutel_23gll; their complements; and variants thereof. 

21 . A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
bibr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; 
bibrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hfbrl_10; 
bfbr2J2ml6;; hfbrl_10; hfbr2_82m6;; hfbrlJO; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; 
hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; 
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hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; 
hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl_lall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f3; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8mlO; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

22. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
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hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; 
hfbrl_10e4; hfbr2_82gl4; hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrl_10; 
hfbr2_82ml6; hfbrllO; hfbr2_82m6; hfbrllO; complements of the nucleic acid 
sequences; and variants thereof. 

23. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16f21; hfbr2_16k22; hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23f2; ; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; 
hfbr2_64al 1; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64k24; 
hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; hfbr2_72dl3; hfbr2_72ml6; 
hfbr2_72nl2; hfbr2_78dl3; hibr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; 
hfbrl lO; complements of the nucleic acid sequences; and variants thereof. 

24. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2 46bl0; hfkd2_46dl3; hfkd2_46J20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; complements of the nucleic acid sequences; and variants thereof. 

25. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: hfkd2_lj9; 
hfkd2_24e23; hfkd2_46a6; bikd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; 
complements of the nucleic acid sequences; and variants thereof. 

26. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hmcfllall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; complements of the nucleic acid 
sequences; and variants thereof. 

27. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hmcfl_lc23; hmcfl_lgl3; complements of the nucleic acid sequences; and variants thereof. 

28. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; 
Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; 
htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; 
htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; 
htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; 
htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; 
htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; 
htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; 
htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; 
htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; 
htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; 
htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; 
htes3_6dl6; htes3_72kll; Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; 
htes3_7j8; htes3_7pl0; htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; complements of the nucleic acid 
sequences; and variants thereof. 

29. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: htes3_14g5; 
htes3_14pl4; htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; 
htes3_17fl0; htes3_17nl8; Htes3_18f3; htes3_19fl9; htes3_19jl7; htes3_20c21; 
htes3_21n23; htes3_22c23; htes3_22nl3; Htes3_23nl9; htes3_27ol4; htes3_28dl4; 
htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; htes3_2hl5; htes3_2119; htes3_2m20; 
htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; htes3_35pl7; htes3_4M; htes3_4f!7; 
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htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; htes3_6b21; htes3_6dl6; htes3J72kll; 
htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; complements of the nucleic acid sequences; and variants thereof. 

30. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16gl8; hfbr2_2kl4; Htes3_35b4; htes3J5p22; htes3_7j3; htes3_7pl0; 
hutel_20mll; complements of the nucleic acid sequences; and variants thereof. 

31. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_2b5; htes3_15i5; htes3_1817; htes3Jkll; Htes3_72kl5; htes3_7b22; 
hutel_19g22; hutel_24j6; complements of the nucleic acid sequences; and variants thereof. 

32. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_2dl5; htes3_35e21; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 

33. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23124; hfbr2_2il7; hfbr2_41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 
hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; 
complements of the nucleic acid sequences; and variants thereof. 

34. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_3g8; hfbr2_62ol7; hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd2_3ol7; 
hfkd2_46j20; htes3J7117; Htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; 
htes3_35kl6; htes3 J5nl2; htes3_35n9; hutel_20M9; hutel_20m24; hutel_23el3; 
complements of the nucleic acid sequences; and variants thereof. 

35. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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Mbr2_23bl0; hfbr2Jcl8; hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2J72112; 
hfbr2_82i24(hfbrl_10)L.htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; 
htes3_7p9; htes3_8ml0; hutel_1811; complements of the nucleic acid sequences; and 
variants thereof. 

36. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23b21; hfbr2_23nl6; hfbr2_2cl7; hfbr2_62bll; hfbr2J78c24; hfbr2_82e4 
(hfbrl_10e4); hfbr2_82il7 (hfbrl_10); hfbr2_82m6 (hfbrlJ0);_hfkd2_46m4; htes3_15kll; 
htes3Jcl; hhtes3Jn3; htes3_20k2; htes3 21d4; htes3_23nl9; htes3_4f5; htes3_6cll; 
htes3_8e24; hutel_20g21; hutel_22d2; hutel_22el2; complements of the nucleic acid 
sequences; and variants thereof. 

37. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16il2; hfbr2_16112; hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; 
hfbr2_82c20 (hfbrl_10c20);Jifbr2J2el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); 
hfkd2_24al5; hfkd2_3il3; hfkd2_4mll; hmcfl_lall; hmcfl_lel5; htes3_15c6; 
htes3_2ol3; htes3_27k4; htes3_2hl; htes3_35k24; hutel_19fl9; and hutel_24cl9; 
complements of the nucleic acid sequences; and variants thereof. 

38. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; hfkd2_47a4; htes3_2el2; htes3_21jl5; htes3_17nl2; hutel J8il9; 
hutel_li2; complements of the nucleic acid sequences; and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel J8il9; hutel J8i4; hutel J811; hutel_19fl9; 
hutel_19gl9; hutel_19g22; hutel J9hl7; hutel_19jll; hutel li2; hutel_20bl9; 
hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; 
hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; 
hutel_24ell; hutel_24j6; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 
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40. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18i4; hutel_19gl9; hutel_19jl 1; hutel_22n2; 
hutel_21dl5; hutel_22o2; hutel_23gl 1 ; complements of the nucleic acid sequences; and 
variants thereof. 

41 . A nucleic acid molecule having the sequence of a clone selected from the 
group consisting of hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; 
hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; 
hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; 
hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; 
hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; 
hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; 
hfbr2_312; hfbr2_41ml5; hfbr2_62bll; blbr2_62fl0; hfbr2_62119; hfbr2_62nl0; 
hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; 
hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; 
hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; 
hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; 
hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; 
hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; 
hfbrl_10; hfbr2_82i24;; hfbrl_10; hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; hmcfl_lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; hhtes3_ln3; 
htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; 
htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21U6; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; 
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htes3_2m20; htes3_2n9; htes3_2o!3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3jtti6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3j3dl6; htes3J72kll; 
Htes3_72kl5; htes3J72pl6; htes3_7b22; htes3_7dl7; htes3J7j3; htes3_7j8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel J8i4; hutel_1811; 
hutel J9fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

42. A polypeptide encoded by the nucleic acid molecule according to claim 41. 

43 . An antibody or fragment thereof that is capable of binding to a specific portion 
of the peptide according to claim 42. 

44. A pharmaceutical composition, comprising (a) an effective amount of a 
pharmaceutical agent, wherein said pharmaceutical agent is selected from the group consisting 
of the polypeptide according to claim 42, variants or functional derivatives thereof, and 
antibodies thereto; and (2) a physiologicallyacceptablecarrier or excipient. 

45. An expression vector comprising the nucleic acid molecule of claim 41 or a 
fragment thereof, and optionally a promoter operably linked to said nucleic acid molecule or 
said fragment. 

46. A method for recombinantly producing a desired peptide, comprising expressing 
in a host cell a peptide encoded by the nucleic acid molecule according to claim 41 . 
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1 — 1 searchable claims. 



2. j | As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 

of any additional fee. 



3. As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
' ' covers only those claims for which fees were paid, specifically claims Nos.: 



4. I y I No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 




Claims Nos.: 21 "40 

because they relate to subject matter not required to be searched by this Authority, namely: 




because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful InternationaJ Search can be carried out, specifically: 




1-46 all partially 



Remark on Protest 





No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet (1)) (July 1998) 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



1. Claims: 1-46, all partially 
Invention 1: 

A nucleic acid molecule having the sequence of the clone 
hfbr2 16cl6 (corresponding to SEQ.ID.l); an assemblage 
comprising said nucleic acid; a computer readable medium 
comprising said nucleic acid; a polypeptide encoded by said 
nucleic acid; an antibody binding to said polypeptide; an 
expression vector comprising said nucleic acid and a method 
for producing said polypeptide. 



2. Claims: 1-46, all partially 
Invention 2-233: 

same as invention 1, but for each single clone as set forth 
in claim 1 (i.e. starting with clone hfbr2_16f21 and ending 
with clone hutel_2h3) 



NB: for the sake of conciseness, the first subject-matter is 
explicitly defined, the other subject-matter by analogy 
thereto. 
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